AAindex1
The AAindex1 section currently contains 566 amino acid indices representing the various physicochemical, structural and biochemical properties of amino acids. Each entry consists of an accession number, a short description on the index, the reference information, notes, PMID (pubmed ID) and the numerical values for the property of 20 amino acids. In addition, it contains neighbour information; namely, the cross-links to other entries with an absolute value for the correlation coefficient of 0.8 or larger, allowing users to identify entries describing similar properties.
Record format
************************************************************************
* *
* H Accession number *
* D Data description *
* R Pub med article ID (PMID) *
* A Author(s) *
* T Title of the article *
* J Journal reference *
* * Comment or missing *
* C Accession numbers of similar entries with the correlation *
* coefficients of 0.8 (-0.8) or more (less). *
* Notice: The correlation coefficient is calculated with zeros *
* filled for missing values. *
* I Amino acid index data in the following order *
* Ala Arg Asn Asp Cys Gln Glu Gly His Ile *
* Leu Lys Met Phe Pro Ser Thr Trp Tyr Val *
* // *
************************************************************************
- class aaindex.aaindex1.AAIndex1[source]
Bases:
objectPython parser for AAindex1: Amino Acid Index Database.
The AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids. This class stores the amino acid index of 20 numerical values for the 20 amino acids — AAindex1 (http://www.genome.jp/aaindex/).
- aaindex_module_path
Absolute path to the aaindex package directory.
- data_dir
Subdirectory name containing raw and cached data files.
- aaindex_filename
Base filename for this database (no extension).
- aaindex_json
Parsed database keyed by accession number.
- categories
Dict mapping each record code to its category.
- last_updated
Date string of the last published database update.
- __getitem__(record_code: str) Map[source]
Return a record by accession number wrapped in a Map (dot-notation dict).
- Parameters:
record_code – AAindex accession number (case-insensitive, leading/trailing whitespace is stripped).
- Returns:
Record data as a Map, accessible via dict or dot notation.
- Raises:
TypeError – If record_code is not a string.
ValueError – If record_code is not found in the database.
- amino_acids() List[str][source]
Return sorted list of amino acid single-letter codes.
Includes the
-placeholder for absent/gap amino acids.- Returns:
Sorted list of amino acid codes including
-.
- get_all_categories(category_file: str = 'aaindex_categories.txt') Dict[source]
Return dict mapping every record code to its category.
Reads from the parsed
aaindex_categories.txtfile produced byparse_categories(). If the file does not yet exist, it is generated first.- Parameters:
category_file – Filename of the pre-parsed categories file inside the data directory.
- Returns:
Dict mapping each record code to its category string.
- Raises:
IOError – If the categories file cannot be opened.
- get_record_by_category(category: str) Dict[source]
Return all records belonging to a given category.
- Parameters:
category – Category name to filter records by (case-insensitive).
- Returns:
Dict of matching records keyed by accession number.
- Raises:
TypeError – If category is not a string.
- num_records() int[source]
Return the total number of records in the database.
- Returns:
Number of records as int.
- parse_aaindex() Dict[source]
Parse the raw AAindex1 database file into a nested dict and cache as JSON.
Each record is keyed by its accession number and stores metadata, amino acid values, and category. The result is written to a .json file in the data directory for fast subsequent loads.
- Returns:
Parsed database keyed by accession number.
- Raises:
IOError – If the raw database file cannot be opened.
ValueError – If a duplicate accession number is encountered.
- parse_categories(aaindex_category_file: str = 'aaindex_to_category.txt') Dict[source]
Parse category file mapping each AAi record to one of 8 categories.
Category file and parsing code inspired from https://github.com/harmslab/hops.
- Parameters:
aaindex_category_file – Filename or full path of the category mapping file. Defaults to
aaindex_to_category.txtin the data directory.- Returns:
Dict mapping each record code to its category string.
- Raises:
IOError – If the category file cannot be opened.
- record_codes() List[str][source]
Return sorted list of all accession numbers in the database.
- Returns:
Sorted list of accession number strings.
- record_names() List[str][source]
Return a list of description strings for all records.
- Returns:
List of description strings in database insertion order.
- search(description: str | List[str]) Dict[source]
Search records by keyword(s) present in their description field.
- Parameters:
description – Keyword string or list of keyword strings. Matching is case-insensitive.
- Returns:
Dict of matching records keyed by accession number. Returns an empty dict if no records match.
- Raises:
TypeError – If description is not a str or list.
- values(record_code: str) Dict[source]
Return the amino acid values dict for a given record.
Shortcut to avoid accessing the full record when only the values are needed.
- Parameters:
record_code – AAindex accession number.
- Returns:
Dict of amino acid values for the specified record.
- Raises:
ValueError – If record_code is not found in the database.
References
[1] Kawashima, S. and Kanehisa, M.; AAindex: amino acid index database. Nucleic Acids Res. 28, 374 (2000).