
class pybiomart.Dataset(name, display_name='', host=None, path=None, port=None, use_cache=True, virtual_schema='default')[source]

Class representing a biomart dataset.

This class is responsible for handling queries to biomart datasets. Queries can select a subset of attributes and can be filtered using any available filters. A list of valid attributes is available in the attributes property. If no attributes are given, a set of default attributes is used. A list of valid filters is available in the filters property. The type of value that can be specified for a given filter depends on the filter as some filters accept single values, whilst others can take lists of values.

  • name (str) – Id of the dataset.
  • display_name (str) – Display name of the dataset.
  • host (str) – Url of host to connect to.
  • path (str) – Path on the host to access to the biomart service.
  • port (int) – Port to use for the connection.
  • use_cache (bool) – Whether to cache requests.
  • virtual_schema (str) – The virtual schema of the dataset.


Directly connecting to a dataset:
>>> dataset = Dataset(name='hsapiens_gene_ensembl',
>>>                   host='http://www.ensembl.org')
Querying the dataset:
>>> dataset.query(attributes=['ensembl_gene_id',
>>>                           'external_gene_name'],
>>>               filters={'chromosome_name': ['1','2']})
Listing available attributes:
>>> dataset.attributes
>>> dataset.list_attributes()
Listing available filters:
>>> dataset.filters
>>> dataset.list_filters()

List of attributes available for the dataset (cached).


List of default attributes for the dataset.


Display name of the dataset.


List of filters available for the dataset.


Lists available attributes in a readable DataFrame format.

Returns:Frame listing available attributes.
Return type:pd.DataFrame

Lists available filters in a readable DataFrame format.

Returns:Frame listing available filters.
Return type:pd.DataFrame

Name of the dataset (used as dataset id).

query(attributes=None, filters=None, only_unique=True, use_attr_names=False)[source]

Queries the dataset to retrieve the contained data.

  • attributes (list[str]) – Names of attributes to fetch in query. Attribute names must correspond to valid attributes. See the attributes property for a list of valid attributes.
  • filters (dict[str,any]) – Dictionary of filters –> values to filter the dataset by. Filter names and values must correspond to valid filters and filter values. See the filters property for a list of valid filters.
  • only_unique (bool) – Whether to return only rows containing unique values (True) or to include duplicate rows (False).
  • use_attr_names (bool) – Whether to use the attribute names as column names in the result (True) or the attribute display names (False).

DataFrame containing the query results.

Return type:



class pybiomart.Server(host=None, path=None, port=None, use_cache=True)[source]

Class representing a biomart server.

Typically used as main entry point to the biomart server. Provides functionality for listing and loading the marts that are available on the server.

  • host (str) – Url of host to connect to.
  • path (str) – Path on the host to access to the biomart service.
  • port (int) – Port to use for the connection.
  • use_cache (bool) – Whether to cache requests.


Connecting to a server and listing available marts:
>>> server = Server(host='http://www.ensembl.org')
>>> server.list_marts()
Retrieving a mart:
>>> mart = server['ENSEMBL_MART_ENSEMBL']

Lists available marts in a readable DataFrame format.

Returns:Frame listing available marts.
Return type:pd.DataFrame

List of available marts.


class pybiomart.Mart(name, database_name, display_name, host=None, path=None, port=None, use_cache=True, virtual_schema='default', extra_params=None)[source]

Class representing a biomart mart.

Used to represent specific mart instances on the server. Provides functionality for listing and loading the datasets that are available in the corresponding mart.

  • name (str) – Name of the mart.
  • database_name (str) – ID of the mart on the host.
  • display_name (str) – Display name of the mart.
  • host (str) – Url of host to connect to.
  • path (str) – Path on the host to access to the biomart service.
  • port (int) – Port to use for the connection.
  • use_cache (bool) – Whether to cache requests.
  • virtual_schema (str) – The virtual schema of the dataset.


Listing datasets:
>>> server = Server(host='http://www.ensembl.org')
>>> mart = server.['ENSEMBL_MART_ENSEMBL']
>>> mart.list_datasets()
Selecting a dataset:
>>> dataset = mart['hsapiens_gene_ensembl']

Database name of the mart on the host.


List of datasets in this mart.


Display name of the mart.


Lists available datasets in a readable DataFrame format.

Returns:Frame listing available datasets.
Return type:pd.DataFrame

Name of the mart (used as id).