Internal Base Class
- class aaindex._aaindex_matrix.Map(*args, **kwargs)[source]
A dict subclass that enables attribute-style (dot notation) access to keys.
Works for nested dicts. Each AAindex record returned by __getitem__ is wrapped in this class so fields can be read as record.description as well as record[‘description’].
References
https://stackoverflow.com/questions/2352181/how-to-use-a-dot-to-access-members-of-dictionary
- class aaindex._aaindex_matrix._AAIndexMatrix(filename: str)[source]
Base class for AAindex2 and AAindex3 matrix database parsers.
Provides shared parsing, lookup, search, and protocol methods for the lower-triangular 20x20 matrix databases. Subclasses call super().__init__(filename) with the appropriate base filename so the correct data file is loaded.
- aaindex_module_path
Absolute path to the aaindex package directory.
- data_dir
Subdirectory name containing raw and cached data files.
- aaindex_filename
Base filename for this database (no extension).
- aaindex_json
Parsed database keyed by accession number.
- last_updated
Date string of the last published database update.
- __getitem__(record_code: str) Map[source]
Return a record by accession number wrapped in a Map (dot-notation dict).
- Parameters:
record_code – AAindex accession number (case-insensitive, leading/trailing whitespace is stripped).
- Returns:
Record data as a Map, accessible via dict or dot notation.
- Raises:
TypeError – If record_code is not a string.
ValueError – If record_code is not found in the database.
- amino_acids() List[str][source]
Return sorted list of the 20 canonical amino acid single-letter codes.
Derived from the row_order field of the first record in the database.
- Returns:
Sorted list of single-letter amino acid codes.
- get(record_code: str, aa1: str, aa2: str) float | None[source]
Return the pairwise matrix score for two amino acids from a given record.
The matrix is symmetric so get(code, aa1, aa2) == get(code, aa2, aa1). Returns None when either amino acid carries an NA value in the source data or when the amino acid letter is not present in this record’s matrix.
- Parameters:
record_code – AAindex accession number.
aa1 – Single-letter code for the first amino acid.
aa2 – Single-letter code for the second amino acid.
- Returns:
Pairwise score as float, or None if data is not available.
- Raises:
TypeError – If aa1 or aa2 are not strings.
ValueError – If record_code is not found in the database.
- num_records() int[source]
Return the total number of records in the database.
- Returns:
Number of records as int.
- parse_aaindex() Dict[source]
Parse the raw AAindex database file into a nested dict and cache as JSON.
Each record is keyed by its accession number and stores metadata alongside the full symmetric 20x20 matrix reconstructed from the lower-triangular source data. The result is written to a .json file in the data directory for fast subsequent loads.
- Returns:
Parsed database keyed by accession number.
- Return type:
- Raises:
IOError – If the raw database file cannot be opened.
ValueError – If a duplicate accession number is encountered.
- record_codes() List[str][source]
Return sorted list of all accession numbers in the database.
- Returns:
Sorted list of accession number strings.
- record_names() List[str][source]
Return a list of description strings for all records.
- Returns:
List of description strings in database insertion order.
- search(description: str | List[str]) Dict[source]
Search records by keyword(s) present in their description field.
- Parameters:
description – Keyword string or list of keyword strings. Matching is case-insensitive.
- Returns:
Dict of matching records keyed by accession number. Returns an empty dict if no records match.
- Raises:
TypeError – If description is not a str or list.
- values(record_code: str) Dict[source]
Return the full 20x20 matrix dict for a given record.
Shortcut to avoid accessing the whole record when only the matrix is needed. Consistent with AAIndex1.values() which returns amino acid values.
- Parameters:
record_code – AAindex accession number.
- Returns:
Nested dict of pairwise scores keyed by single-letter amino acid codes.
- Raises:
ValueError – If record_code is not found in the database.