AAindex1

The AAindex1 section currently contains 566 amino acid indices representing the various physicochemical, structural and biochemical properties of amino acids. Each entry consists of an accession number, a short description on the index, the reference information, notes, PMID (pubmed ID) and the numerical values for the property of 20 amino acids. In addition, it contains neighbour information; namely, the cross-links to other entries with an absolute value for the correlation coefficient of 0.8 or larger, allowing users to identify entries describing similar properties.

Record format

************************************************************************
*                                                                      *
* H Accession number                                                   *
* D Data description                                                   *
* R Pub med article ID (PMID)                                          *
* A Author(s)                                                          *
* T Title of the article                                               *
* J Journal reference                                                  *
* * Comment or missing                                                 *
* C Accession numbers of similar entries with the correlation          *
*   coefficients of 0.8 (-0.8) or more (less).                         *
*   Notice: The correlation coefficient is calculated with zeros       *
*   filled for missing values.                                         *
* I Amino acid index data in the following order                       *
*   Ala    Arg    Asn    Asp    Cys    Gln    Glu    Gly    His    Ile *
*   Leu    Lys    Met    Phe    Pro    Ser    Thr    Trp    Tyr    Val *
* //                                                                   *
************************************************************************
class aaindex.aaindex1.AAIndex1[source]

Bases: object

Python parser for AAindex1: Amino Acid Index Database.

The AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids. This class stores the amino acid index of 20 numerical values for the 20 amino acids — AAindex1 (http://www.genome.jp/aaindex/).

aaindex_module_path

Absolute path to the aaindex package directory.

data_dir

Subdirectory name containing raw and cached data files.

aaindex_filename

Base filename for this database (no extension).

aaindex_json

Parsed database keyed by accession number.

categories

Dict mapping each record code to its category.

last_updated

Date string of the last published database update.

__contains__(record_code: object) bool[source]

Return True if record_code exists in the database.

__getitem__(record_code: str) Map[source]

Return a record by accession number wrapped in a Map (dot-notation dict).

Parameters:

record_code – AAindex accession number (case-insensitive, leading/trailing whitespace is stripped).

Returns:

Record data as a Map, accessible via dict or dot notation.

Raises:
  • TypeError – If record_code is not a string.

  • ValueError – If record_code is not found in the database.

__iter__() Iterator[str][source]

Iterate over all record codes in the database.

__len__() int[source]

Return total number of records in the database.

__repr__() str[source]

Return a canonical string representation of this instance.

__sizeof__() int[source]

Return the on-disk size of the raw AAindex data file in bytes.

amino_acids() List[str][source]

Return sorted list of amino acid single-letter codes.

Includes the - placeholder for absent/gap amino acids.

Returns:

Sorted list of amino acid codes including -.

get_all_categories(category_file: str = 'aaindex_categories.txt') Dict[source]

Return dict mapping every record code to its category.

Reads from the parsed aaindex_categories.txt file produced by parse_categories(). If the file does not yet exist, it is generated first.

Parameters:

category_file – Filename of the pre-parsed categories file inside the data directory.

Returns:

Dict mapping each record code to its category string.

Raises:

IOError – If the categories file cannot be opened.

get_record_by_category(category: str) Dict[source]

Return all records belonging to a given category.

Parameters:

category – Category name to filter records by (case-insensitive).

Returns:

Dict of matching records keyed by accession number.

Raises:

TypeError – If category is not a string.

num_records() int[source]

Return the total number of records in the database.

Returns:

Number of records as int.

parse_aaindex() Dict[source]

Parse the raw AAindex1 database file into a nested dict and cache as JSON.

Each record is keyed by its accession number and stores metadata, amino acid values, and category. The result is written to a .json file in the data directory for fast subsequent loads.

Returns:

Parsed database keyed by accession number.

Raises:
  • IOError – If the raw database file cannot be opened.

  • ValueError – If a duplicate accession number is encountered.

parse_categories(aaindex_category_file: str = 'aaindex_to_category.txt') Dict[source]

Parse category file mapping each AAi record to one of 8 categories.

Category file and parsing code inspired from https://github.com/harmslab/hops.

Parameters:

aaindex_category_file – Filename or full path of the category mapping file. Defaults to aaindex_to_category.txt in the data directory.

Returns:

Dict mapping each record code to its category string.

Raises:

IOError – If the category file cannot be opened.

record_codes() List[str][source]

Return sorted list of all accession numbers in the database.

Returns:

Sorted list of accession number strings.

record_names() List[str][source]

Return a list of description strings for all records.

Returns:

List of description strings in database insertion order.

search(description: str | List[str]) Dict[source]

Search records by keyword(s) present in their description field.

Parameters:

description – Keyword string or list of keyword strings. Matching is case-insensitive.

Returns:

Dict of matching records keyed by accession number. Returns an empty dict if no records match.

Raises:

TypeError – If description is not a str or list.

values(record_code: str) Dict[source]

Return the amino acid values dict for a given record.

Shortcut to avoid accessing the full record when only the values are needed.

Parameters:

record_code – AAindex accession number.

Returns:

Dict of amino acid values for the specified record.

Raises:

ValueError – If record_code is not found in the database.

References

[1] Kawashima, S. and Kanehisa, M.; AAindex: amino acid index database. Nucleic Acids Res. 28, 374 (2000).