Chemical Metadata (chemicals.identifiers)¶
This module contains a database of metadata on ~70000 chemicals from the PubChem datase. It contains comprehensive feature for searching the metadata. It also includes a small database of common mixture compositions.
For reporting bugs, adding feature requests, or submitting pull requests, please use the GitHub issue tracker.
Search Functions¶
- chemicals.identifiers.CAS_from_any(ID, autoload=False, cache=True)[source]¶
Wrapper around search_chemical which returns the CAS number of the found chemical directly.
- Parameters
- Returns
- CASRN
str
A three-piece, dash-separated set of numbers
- CASRN
Notes
An exception is raised if the name cannot be identified. The PubChem database includes a wide variety of other synonyms, but these may not be present for all chemcials. See search_chemical for more details.
Examples
>>> CAS_from_any('water') '7732-18-5' >>> CAS_from_any('InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3') '64-17-5' >>> CAS_from_any('CCCCCCCCCC') '124-18-5' >>> CAS_from_any('InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N') '64-17-5' >>> CAS_from_any('pubchem=702') '64-17-5' >>> CAS_from_any('O') # only elements can be specified by symbol '17778-80-2'
- chemicals.identifiers.MW(ID, autoload=False, cache=True)[source]¶
Wrapper around search_chemical which returns the molecular weight of the found chemical directly.
- Parameters
- Returns
- MW
float
Molecular weight of chemical, [g/mol]
- MW
Notes
An exception is raised if the name cannot be identified. The PubChem database includes a wide variety of other synonyms, but these may not be present for all chemcials. See search_chemical for more details.
Examples
>>> MW('water') 18.01528 >>> MW('InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3') 46.06844 >>> MW('CCCCCCCCCC') 142.28168 >>> MW('InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N') 46.06844 >>> MW('pubchem=702') 46.06844 >>> MW('O') # only elements can be specified by symbol 15.9994
- chemicals.identifiers.search_chemical(ID, autoload=False, cache=True)[source]¶
Looks up metadata about a chemical by searching and testing for the input string being any of the following types of chemical identifiers:
Name, in IUPAC form or common form or a synonym registered in PubChem
InChI name, prefixed by ‘InChI=1S/’ or ‘InChI=1/’
InChI key, prefixed by ‘InChIKey=’
PubChem CID, prefixed by ‘PubChem=’
SMILES (prefix with ‘SMILES=’ to ensure smiles parsing; ex. ‘C’ will return Carbon as it is an element whereas the SMILES interpretation for ‘C’ is methane)
CAS number (obsolete numbers may point to the current number)
If the input is an ID representing an element, the following additional inputs may be specified as
Atomic symbol (ex ‘Na’)
Atomic number (as a string)
- Parameters
- Returns
- chemical_metadata
ChemicalMetadata
A class containing attributes which describe the chemical’s metadata, [-]
- chemical_metadata
Notes
An exception is raised if the name cannot be identified. The PubChem database includes a wide variety of other synonyms, but these may not be present for all chemcials.
Examples
>>> print(search_chemical('water')) <ChemicalMetadata, name=water, formula=H2O, smiles=O, MW=18.0153> >>> print(search_chemical('InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3')) <ChemicalMetadata, name=ethanol, formula=C2H6O, smiles=CCO, MW=46.0684> >>> print(search_chemical('CCCCCCCCCC')) <ChemicalMetadata, name=decane, formula=C10H22, smiles=CCCCCCCCCC, MW=142.282> >>> print(search_chemical('InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N')) <ChemicalMetadata, name=ethanol, formula=C2H6O, smiles=CCO, MW=46.0684> >>> print(search_chemical('pubchem=702')) <ChemicalMetadata, name=ethanol, formula=C2H6O, smiles=CCO, MW=46.0684> >>> print(search_chemical('O')) # only elements can be specified by symbol <ChemicalMetadata, name=atomic oxygen, formula=O, smiles=[O], MW=15.9994>
- chemicals.identifiers.IDs_to_CASs(IDs)[source]¶
Find the CAS numbers for multiple chemicals names at once. Also supports having a string input which is a common mixture name in the database. An error will be raised if any of the chemicals cannot be found.
- Parameters
- Returns
Notes
White space, ‘-’, and upper case letters are removed in the search.
Examples
>>> IDs_to_CASs('R512A') ['811-97-2', '75-37-6'] >>> IDs_to_CASs(['norflurane', '1,1-difluoroethane']) ['811-97-2', '75-37-6']
CAS Number Utilities¶
- chemicals.identifiers.check_CAS(CASRN)[source]¶
Checks if a CAS number is valid. Returns False if the parser cannot parse the given string.
- Parameters
- CASRN
str
A three-piece, dash-separated set of numbers
- CASRN
- Returns
- resultbool
Boolean value if CASRN was valid. If parsing fails, return False also.
Notes
Check method is according to Chemical Abstract Society. However, no lookup to their service is performed; therefore, this function cannot detect false positives.
Function also does not support additional separators, apart from ‘-‘.
CAS numbers up to the series 1 XXX XXX-XX-X are now being issued.
A long can hold CAS numbers up to 2 147 483-64-7
Examples
>>> check_CAS('7732-18-5') True >>> check_CAS('77332-18-5') False
- chemicals.identifiers.CAS_to_int(CASRN)[source]¶
Converts CAS number of a compounds from a string to an int. This is helpful when storing large amounts of CAS numbers, as their strings take up more memory than their numerical representational. All CAS numbers fit into 64 bit ints.
Notes
Accomplishes conversion by removing dashes only, and then converting to an int. An incorrect CAS number will change without exception.
Examples
>>> CAS_to_int('7704-34-9') 7704349
- chemicals.identifiers.int_to_CAS(CASRN)[source]¶
Converts CAS number of a compounds from an int to an string. This is helpful when dealing with int CAS numbers.
Notes
Handles CAS numbers with an unspecified number of digits. Does not work on floats.
Examples
>>> int_to_CAS(7704349) '7704-34-9'
- chemicals.identifiers.sorted_CAS_key(CASs)[source]¶
Takes a list of CAS numbers as strings, and returns a tuple of the same CAS numbers, sorted from smallest to largest. This is very convenient for obtaining a unique hash of a set of compounds, so as to see if two groups of compounds are the same.
- Parameters
- Returns
Notes
Does not check CAS numbers for validity.
Examples
>>> sorted_CAS_key(['7732-18-5', '64-17-5', '108-88-3', '98-00-0']) ('64-17-5', '98-00-0', '108-88-3', '7732-18-5')
Database Objects¶
There is an object used to represent a chemical’s metadata, an object used to represent a common mixture’s composition, and an object used to hold the mixture metadata.
- class chemicals.identifiers.ChemicalMetadata(pubchemid, CAS, formula, MW, smiles, InChI, InChI_key, iupac_name, common_name, synonyms)[source]¶
Class for storing metadata on chemicals.
- Attributes
- pubchemid
int
Identification number on pubchem database; access their information online at https://pubchem.ncbi.nlm.nih.gov/compound/<pubchemid> [-]
- formula
str
Formula of the compound; in the same format as
chemicals.elements.serialize_formula
generates, [-]- MW
float
Molecular weight of the compound as calculated with the standard atomic abundances; consistent with the element weights in
chemicals.elements.periodic_table
, [g/mol]- smiles
str
SMILES identification string, [-]
- InChI
str
InChI identification string as given in pubchem (there can be multiple valid InChI strings for a compound), [-]
- InChI_key
str
InChI key identification string (meant to be unique to a compound), [-]
- iupac_name
str
IUPAC name as given in pubchem, [-]
- common_name
str
Common name as given in pubchem, [-]
- synonyms
list
[str
] List of synonyms of the compound, [-]
- CAS
int
CAS number of the compound; stored as an int for memory efficiency, [-]
- pubchemid
- class chemicals.identifiers.CommonMixtureMetadata(name, CASs, N, source, names, ws, zs, synonyms)[source]¶
Class for storing metadata on predefined chemical mixtures.
- Attributes
- name
str
Name of the mixture, [-]
- source
str
Source of the mixture composition, [-]
- N
int
Number of chemicals in the mixture, [-]
- CASs
list
[str
] CAS numbers of the mixture, [-]
- ws
list
[float
] Mass fractions of chemicals in the mixture, [-]
- zs
list
[float
] Mole fractions of chemicals in the mixture, [-]
- names
list
[str
] List of names of the chemicals in the mixture, [-]
- synonyms
list
[str
] List of synonyms of the mixture which can also be used to look it up, [-]
- name
- class chemicals.identifiers.ChemicalMetadataDB(elements=True, main_db='/home/docs/checkouts/readthedocs.org/user_builds/chemicals/envs/release/lib/python3.11/site-packages/chemicals-1.3.2-py3.11.egg/chemicals/Identifiers/chemical identifiers pubchem large.tsv', user_dbs=['/home/docs/checkouts/readthedocs.org/user_builds/chemicals/envs/release/lib/python3.11/site-packages/chemicals-1.3.2-py3.11.egg/chemicals/Identifiers/chemical identifiers pubchem small.tsv', '/home/docs/checkouts/readthedocs.org/user_builds/chemicals/envs/release/lib/python3.11/site-packages/chemicals-1.3.2-py3.11.egg/chemicals/Identifiers/chemical identifiers example user db.tsv', '/home/docs/checkouts/readthedocs.org/user_builds/chemicals/envs/release/lib/python3.11/site-packages/chemicals-1.3.2-py3.11.egg/chemicals/Identifiers/Cation db.tsv', '/home/docs/checkouts/readthedocs.org/user_builds/chemicals/envs/release/lib/python3.11/site-packages/chemicals-1.3.2-py3.11.egg/chemicals/Identifiers/Anion db.tsv', '/home/docs/checkouts/readthedocs.org/user_builds/chemicals/envs/release/lib/python3.11/site-packages/chemicals-1.3.2-py3.11.egg/chemicals/Identifiers/Inorganic db.tsv'])[source]¶
Object which holds the main database of chemical metadata.
Warning
To allow the chemicals to grow and improve, the details of this class may change in the future without notice!
- Attributes
finished_loading
Whether or not the database has loaded the main database.
Methods
autoload_main_db
()Load the main database when needed.
finish_loading
()Complete loading the main database, if it has not been fully loaded.
load
(file_name)Load a particular file into the indexes.
load_elements
()Load elements into the indexes.
search_CAS
(CAS[, autoload])Search for a chemical by its CAS number.
search_InChI
(InChI[, autoload])Search for a chemical by its InChI string.
search_InChI_key
(InChI_key[, autoload])Search for a chemical by its InChI key.
search_formula
(formula[, autoload])Search for a chemical by its serialized formula.
search_name
(name[, autoload])Search for a chemical by its name.
search_pubchem
(pubchem[, autoload])Search for a chemical by its pubchem number.
search_smiles
(smiles[, autoload])Search for a chemical by its smiles string.
Chemical Groups¶
It is convenient to tag some chemicals with labels like “refrigerant”, or in a certain database or not. The following chemical groups are available.
- chemicals.identifiers.cryogenics = {'132259-10-0': 'Air', '1333-74-0': 'hydrogen', '630-08-0': 'carbon monoxide', '74-82-8': 'methane', '7439-90-9': 'krypton', '7440-01-9': 'neon', '7440-37-1': 'Argon', '7440-59-7': 'helium', '7440-63-3': 'xenon', '7727-37-9': 'nitrogen', '7782-39-0': 'deuterium', '7782-41-4': 'fluorine', '7782-44-7': 'oxygen'}¶
- chemicals.identifiers.inerts = {'10043-92-2': 'radon', '10102-43-9': 'Nitric Oxide', '10102-44-0': 'Nitrogen Dioxide', '124-38-9': 'Carbon Dioxide', '132259-10-0': 'Air', '7439-90-9': 'krypton', '7440-01-9': 'Neon', '7440-37-1': 'Argon', '7440-59-7': 'Helium', '7440-63-3': 'Xenon', '7727-37-9': 'Nitrogen', '7732-18-5': 'water', '7782-41-4': 'fluorine', '7782-44-7': 'Oxygen', '7782-50-5': 'chlorine'}¶