Internal Base Class

class aaindex._aaindex_matrix.Map(*args, **kwargs)[source]

A dict subclass that enables attribute-style (dot notation) access to keys.

Works for nested dicts. Each AAindex record returned by __getitem__ is wrapped in this class so fields can be read as record.description as well as record[‘description’].

References

https://stackoverflow.com/questions/2352181/how-to-use-a-dot-to-access-members-of-dictionary

class aaindex._aaindex_matrix._AAIndexMatrix(filename: str)[source]

Base class for AAindex2 and AAindex3 matrix database parsers.

Provides shared parsing, lookup, search, and protocol methods for the lower-triangular 20x20 matrix databases. Subclasses call super().__init__(filename) with the appropriate base filename so the correct data file is loaded.

aaindex_module_path

Absolute path to the aaindex package directory.

data_dir

Subdirectory name containing raw and cached data files.

aaindex_filename

Base filename for this database (no extension).

aaindex_json

Parsed database keyed by accession number.

last_updated

Date string of the last published database update.

__contains__(record_code: object) bool[source]

Return True if record_code exists in the database.

__getitem__(record_code: str) Map[source]

Return a record by accession number wrapped in a Map (dot-notation dict).

Parameters:

record_code – AAindex accession number (case-insensitive, leading/trailing whitespace is stripped).

Returns:

Record data as a Map, accessible via dict or dot notation.

Raises:
  • TypeError – If record_code is not a string.

  • ValueError – If record_code is not found in the database.

__iter__() Iterator[str][source]

Iterate over all accession numbers in the database.

__len__() int[source]

Return total number of records in the database.

__repr__() str[source]

Return a canonical string representation of this instance.

__sizeof__() int[source]

Return the on-disk size of the raw AAindex data file in bytes.

property aaindex_filename: str
amino_acids() List[str][source]

Return sorted list of the 20 canonical amino acid single-letter codes.

Derived from the row_order field of the first record in the database.

Returns:

Sorted list of single-letter amino acid codes.

property data_dir: str
get(record_code: str, aa1: str, aa2: str) float | None[source]

Return the pairwise matrix score for two amino acids from a given record.

The matrix is symmetric so get(code, aa1, aa2) == get(code, aa2, aa1). Returns None when either amino acid carries an NA value in the source data or when the amino acid letter is not present in this record’s matrix.

Parameters:
  • record_code – AAindex accession number.

  • aa1 – Single-letter code for the first amino acid.

  • aa2 – Single-letter code for the second amino acid.

Returns:

Pairwise score as float, or None if data is not available.

Raises:
  • TypeError – If aa1 or aa2 are not strings.

  • ValueError – If record_code is not found in the database.

property last_updated: str
num_records() int[source]

Return the total number of records in the database.

Returns:

Number of records as int.

parse_aaindex() Dict[source]

Parse the raw AAindex database file into a nested dict and cache as JSON.

Each record is keyed by its accession number and stores metadata alongside the full symmetric 20x20 matrix reconstructed from the lower-triangular source data. The result is written to a .json file in the data directory for fast subsequent loads.

Returns:

Parsed database keyed by accession number.

Return type:

dict

Raises:
  • IOError – If the raw database file cannot be opened.

  • ValueError – If a duplicate accession number is encountered.

record_codes() List[str][source]

Return sorted list of all accession numbers in the database.

Returns:

Sorted list of accession number strings.

record_names() List[str][source]

Return a list of description strings for all records.

Returns:

List of description strings in database insertion order.

search(description: str | List[str]) Dict[source]

Search records by keyword(s) present in their description field.

Parameters:

description – Keyword string or list of keyword strings. Matching is case-insensitive.

Returns:

Dict of matching records keyed by accession number. Returns an empty dict if no records match.

Raises:

TypeError – If description is not a str or list.

values(record_code: str) Dict[source]

Return the full 20x20 matrix dict for a given record.

Shortcut to avoid accessing the whole record when only the matrix is needed. Consistent with AAIndex1.values() which returns amino acid values.

Parameters:

record_code – AAindex accession number.

Returns:

Nested dict of pairwise scores keyed by single-letter amino acid codes.

Raises:

ValueError – If record_code is not found in the database.