API

pybiomart.Dataset

class pybiomart.Dataset(name, display_name='', host=None, path=None, port=None, use_cache=True, virtual_schema='default')[source]

Class representing a biomart dataset.

This class is responsible for handling queries to biomart datasets. Queries can select a subset of attributes and can be filtered using any available filters. A list of valid attributes is available in the attributes property. If no attributes are given, a set of default attributes is used. A list of valid filters is available in the filters property. The type of value that can be specified for a given filter depends on the filter as some filters accept single values, whilst others can take lists of values.

Parameters:
  • name (str) – Id of the dataset.
  • display_name (str) – Display name of the dataset.
  • host (str) – Url of host to connect to.
  • path (str) – Path on the host to access to the biomart service.
  • port (int) – Port to use for the connection.
  • use_cache (bool) – Whether to cache requests.
  • virtual_schema (str) – The virtual schema of the dataset.

Examples

Directly connecting to a dataset:
>>> dataset = Dataset(name='hsapiens_gene_ensembl',
>>>                   host='http://www.ensembl.org')
Querying the dataset:
>>> dataset.query(attributes=['ensembl_gene_id',
>>>                           'external_gene_name'],
>>>               filters={'chromosome_name': ['1','2']})
Listing available attributes:
>>> dataset.attributes
>>> dataset.list_attributes()
Listing available filters:
>>> dataset.filters
>>> dataset.list_filters()
attributes

List of attributes available for the dataset (cached).

default_attributes

List of default attributes for the dataset.

display_name

Display name of the dataset.

filters

List of filters available for the dataset.

list_attributes()[source]

Lists available attributes in a readable DataFrame format.

Returns:Frame listing available attributes.
Return type:pd.DataFrame
list_filters()[source]

Lists available filters in a readable DataFrame format.

Returns:Frame listing available filters.
Return type:pd.DataFrame
name

Name of the dataset (used as dataset id).

query(attributes=None, filters=None, only_unique=True, use_attr_names=False)[source]

Queries the dataset to retrieve the contained data.

Parameters:
  • attributes (list[str]) – Names of attributes to fetch in query. Attribute names must correspond to valid attributes. See the attributes property for a list of valid attributes.
  • filters (dict[str,any]) – Dictionary of filters –> values to filter the dataset by. Filter names and values must correspond to valid filters and filter values. See the filters property for a list of valid filters.
  • only_unique (bool) – Whether to return only rows containing unique values (True) or to include duplicate rows (False).
  • use_attr_names (bool) – Whether to use the attribute names as column names in the result (True) or the attribute display names (False).
Returns:

DataFrame containing the query results.

Return type:

pandas.DataFrame

pybiomart.Server

class pybiomart.Server(host=None, path=None, port=None, use_cache=True)[source]

Class representing a biomart server.

Typically used as main entry point to the biomart server. Provides functionality for listing and loading the marts that are available on the server.

Parameters:
  • host (str) – Url of host to connect to.
  • path (str) – Path on the host to access to the biomart service.
  • port (int) – Port to use for the connection.
  • use_cache (bool) – Whether to cache requests.

Examples

Connecting to a server and listing available marts:
>>> server = Server(host='http://www.ensembl.org')
>>> server.list_marts()
Retrieving a mart:
>>> mart = server['ENSEMBL_MART_ENSEMBL']
list_marts()[source]

Lists available marts in a readable DataFrame format.

Returns:Frame listing available marts.
Return type:pd.DataFrame
marts

List of available marts.

pybiomart.Mart

class pybiomart.Mart(name, database_name, display_name, host=None, path=None, port=None, use_cache=True, virtual_schema='default', extra_params=None)[source]

Class representing a biomart mart.

Used to represent specific mart instances on the server. Provides functionality for listing and loading the datasets that are available in the corresponding mart.

Parameters:
  • name (str) – Name of the mart.
  • database_name (str) – ID of the mart on the host.
  • display_name (str) – Display name of the mart.
  • host (str) – Url of host to connect to.
  • path (str) – Path on the host to access to the biomart service.
  • port (int) – Port to use for the connection.
  • use_cache (bool) – Whether to cache requests.
  • virtual_schema (str) – The virtual schema of the dataset.

Examples

Listing datasets:
>>> server = Server(host='http://www.ensembl.org')
>>> mart = server.['ENSEMBL_MART_ENSEMBL']
>>> mart.list_datasets()
Selecting a dataset:
>>> dataset = mart['hsapiens_gene_ensembl']
database_name

Database name of the mart on the host.

datasets

List of datasets in this mart.

display_name

Display name of the mart.

list_datasets()[source]

Lists available datasets in a readable DataFrame format.

Returns:Frame listing available datasets.
Return type:pd.DataFrame
name

Name of the mart (used as id).