AAindex2
The AAindex2 section currently contains 94 amino acid mutation matrices: 47 symmetric matrices and 19 non-symmetric matrices. The format of the entry is almost the same as that of AAindex1 except that it contains 210 numerical values (20 diagonal and 20 × 19/2 off-diagonal elements) for a symmetric matrix and 400 or more numerical values for a non-symmetric matrix (some matrices include a gap or distinguish two states of cysteine).
Record format
************************************************************************
* *
* Each entry has the following format. *
* *
* H Accession number *
* D Data description *
* R PMID *
* A Author(s) *
* T Title of the article *
* J Journal reference *
* * Comment or missing *
* M rows = ARNDCQEGHILKMFPSTWYV, cols = ARNDCQEGHILKMFPSTWYV *
* AA *
* AR RR *
* AN RN NN *
* AD RD ND DD *
* AC RC NC DC CC *
* AQ RQ NQ DQ CQ QQ *
* AE RE NE DE CE QE EE *
* AG RG NG DG CG QG EG GG *
* AH RH NH DH CH QH EH GH HH *
* AI RI NI DI CI QI EI GI HI II *
* AL RL NL DL CL QL EL GL HL IL LL *
* AK RK NK DK CK QK EK GK HK IK LK KK *
* AM RM NM DM CM QM EM GM HM IM LM KM MM *
* AF RF NF DF CF QF EF GF HF IF LF KF MF FF *
* AP RP NP DP CP QP EP GP HP IP LP KP MP FP PP *
* AS RS NS DS CS QS ES GS HS IS LS KS MS FS PS SS *
* AT RT NT DT CT QT ET GT HT IT LT KT MT FT PT ST TT *
* AW RW NW DW CW QW EW GW HW IW LW KW MW FW PW SW TW WW *
* AY RY NY DY CY QY EY GY HY IY LY KY MY FY PY SY TY WY YY *
* AV RV NV DV CV QV EV GV HV IV LV KV MV FV PV SV TV WV YV VV *
* // *
************************************************************************
- class aaindex.aaindex2.AAIndex2[source]
Bases:
_AAIndexMatrixPython parser for AAindex2: Amino Acid Substitution Matrix Database.
Inherits all parsing, search, and lookup functionality from _AAIndexMatrix. Stores the 94 known 20x20 substitution matrices from AAindex2 (http://www.genome.jp/aaindex/).
References
- [1]: Kawashima, S. and Kanehisa, M.; AAindex: amino acid index database.
Nucleic Acids Res. 28, 374 (2000).
- __getitem__(record_code: str) Map
Return a record by accession number wrapped in a Map (dot-notation dict).
- Parameters:
record_code – AAindex accession number (case-insensitive, leading/trailing whitespace is stripped).
- Returns:
Record data as a Map, accessible via dict or dot notation.
- Raises:
TypeError – If record_code is not a string.
ValueError – If record_code is not found in the database.
- amino_acids() List[str]
Return sorted list of the 20 canonical amino acid single-letter codes.
Derived from the row_order field of the first record in the database.
- Returns:
Sorted list of single-letter amino acid codes.
- get(record_code: str, aa1: str, aa2: str) float | None
Return the pairwise matrix score for two amino acids from a given record.
The matrix is symmetric so get(code, aa1, aa2) == get(code, aa2, aa1). Returns None when either amino acid carries an NA value in the source data or when the amino acid letter is not present in this record’s matrix.
- Parameters:
record_code – AAindex accession number.
aa1 – Single-letter code for the first amino acid.
aa2 – Single-letter code for the second amino acid.
- Returns:
Pairwise score as float, or None if data is not available.
- Raises:
TypeError – If aa1 or aa2 are not strings.
ValueError – If record_code is not found in the database.
- num_records() int
Return the total number of records in the database.
- Returns:
Number of records as int.
- parse_aaindex() Dict
Parse the raw AAindex database file into a nested dict and cache as JSON.
Each record is keyed by its accession number and stores metadata alongside the full symmetric 20x20 matrix reconstructed from the lower-triangular source data. The result is written to a .json file in the data directory for fast subsequent loads.
- Returns:
Parsed database keyed by accession number.
- Return type:
- Raises:
IOError – If the raw database file cannot be opened.
ValueError – If a duplicate accession number is encountered.
- record_codes() List[str]
Return sorted list of all accession numbers in the database.
- Returns:
Sorted list of accession number strings.
- record_names() List[str]
Return a list of description strings for all records.
- Returns:
List of description strings in database insertion order.
- search(description: str | List[str]) Dict
Search records by keyword(s) present in their description field.
- Parameters:
description – Keyword string or list of keyword strings. Matching is case-insensitive.
- Returns:
Dict of matching records keyed by accession number. Returns an empty dict if no records match.
- Raises:
TypeError – If description is not a str or list.
- values(record_code: str) Dict
Return the full 20x20 matrix dict for a given record.
Shortcut to avoid accessing the whole record when only the matrix is needed. Consistent with AAIndex1.values() which returns amino acid values.
- Parameters:
record_code – AAindex accession number.
- Returns:
Nested dict of pairwise scores keyed by single-letter amino acid codes.
- Raises:
ValueError – If record_code is not found in the database.
References
[1] Kawashima, S. and Kanehisa, M.; AAindex: amino acid index database. Nucleic Acids Res. 28, 374 (2000).