API¶
pybiomart.Dataset¶
-
class
pybiomart.
Dataset
(name, display_name='', host=None, path=None, port=None, use_cache=True, virtual_schema='default')[source]¶ Class representing a biomart dataset.
This class is responsible for handling queries to biomart datasets. Queries can select a subset of attributes and can be filtered using any available filters. A list of valid attributes is available in the attributes property. If no attributes are given, a set of default attributes is used. A list of valid filters is available in the filters property. The type of value that can be specified for a given filter depends on the filter as some filters accept single values, whilst others can take lists of values.
Parameters: - name (str) – Id of the dataset.
- display_name (str) – Display name of the dataset.
- host (str) – Url of host to connect to.
- path (str) – Path on the host to access to the biomart service.
- port (int) – Port to use for the connection.
- use_cache (bool) – Whether to cache requests.
- virtual_schema (str) – The virtual schema of the dataset.
Examples
- Directly connecting to a dataset:
>>> dataset = Dataset(name='hsapiens_gene_ensembl', >>> host='http://www.ensembl.org')
- Querying the dataset:
>>> dataset.query(attributes=['ensembl_gene_id', >>> 'external_gene_name'], >>> filters={'chromosome_name': ['1','2']})
- Listing available attributes:
>>> dataset.attributes >>> dataset.list_attributes()
- Listing available filters:
>>> dataset.filters >>> dataset.list_filters()
-
attributes
¶ List of attributes available for the dataset (cached).
-
default_attributes
¶ List of default attributes for the dataset.
-
display_name
¶ Display name of the dataset.
-
filters
¶ List of filters available for the dataset.
-
list_attributes
()[source]¶ Lists available attributes in a readable DataFrame format.
Returns: Frame listing available attributes. Return type: pd.DataFrame
-
list_filters
()[source]¶ Lists available filters in a readable DataFrame format.
Returns: Frame listing available filters. Return type: pd.DataFrame
-
name
¶ Name of the dataset (used as dataset id).
-
query
(attributes=None, filters=None, only_unique=True, use_attr_names=False)[source]¶ Queries the dataset to retrieve the contained data.
Parameters: - attributes (list[str]) – Names of attributes to fetch in query. Attribute names must correspond to valid attributes. See the attributes property for a list of valid attributes.
- filters (dict[str,any]) – Dictionary of filters –> values to filter the dataset by. Filter names and values must correspond to valid filters and filter values. See the filters property for a list of valid filters.
- only_unique (bool) – Whether to return only rows containing unique values (True) or to include duplicate rows (False).
- use_attr_names (bool) – Whether to use the attribute names as column names in the result (True) or the attribute display names (False).
Returns: DataFrame containing the query results.
Return type: pandas.DataFrame
pybiomart.Server¶
-
class
pybiomart.
Server
(host=None, path=None, port=None, use_cache=True)[source]¶ Class representing a biomart server.
Typically used as main entry point to the biomart server. Provides functionality for listing and loading the marts that are available on the server.
Parameters: - host (str) – Url of host to connect to.
- path (str) – Path on the host to access to the biomart service.
- port (int) – Port to use for the connection.
- use_cache (bool) – Whether to cache requests.
Examples
- Connecting to a server and listing available marts:
>>> server = Server(host='http://www.ensembl.org') >>> server.list_marts()
- Retrieving a mart:
>>> mart = server['ENSEMBL_MART_ENSEMBL']
-
list_marts
()[source]¶ Lists available marts in a readable DataFrame format.
Returns: Frame listing available marts. Return type: pd.DataFrame
-
marts
¶ List of available marts.
pybiomart.Mart¶
-
class
pybiomart.
Mart
(name, database_name, display_name, host=None, path=None, port=None, use_cache=True, virtual_schema='default', extra_params=None)[source]¶ Class representing a biomart mart.
Used to represent specific mart instances on the server. Provides functionality for listing and loading the datasets that are available in the corresponding mart.
Parameters: - name (str) – Name of the mart.
- database_name (str) – ID of the mart on the host.
- display_name (str) – Display name of the mart.
- host (str) – Url of host to connect to.
- path (str) – Path on the host to access to the biomart service.
- port (int) – Port to use for the connection.
- use_cache (bool) – Whether to cache requests.
- virtual_schema (str) – The virtual schema of the dataset.
Examples
- Listing datasets:
>>> server = Server(host='http://www.ensembl.org') >>> mart = server.['ENSEMBL_MART_ENSEMBL'] >>> mart.list_datasets()
- Selecting a dataset:
>>> dataset = mart['hsapiens_gene_ensembl']
-
database_name
¶ Database name of the mart on the host.
-
datasets
¶ List of datasets in this mart.
-
display_name
¶ Display name of the mart.
-
list_datasets
()[source]¶ Lists available datasets in a readable DataFrame format.
Returns: Frame listing available datasets. Return type: pd.DataFrame
-
name
¶ Name of the mart (used as id).