in reply to Potential hash key ids limitations

>tr|Q8PV56|Q8PV56_METMA PEP-CTERM sorting domain-containing protein OS +=Methanosarcina mazei (strain ATCC BAA-159 / DSM 3647 / Goe1 / Go1 / +JCM 11833 / OCM 88) OX=192952 GN=MM_2118 PE=4 SV=1

You know the Q8PV56 part is already a unique UniProt identifier, right? UniProt calls it an 'Accession Number'. There would seem to be no need to use the whole line as a key.

Here is the info in case you want to excise the accession number with a regular expression: accession_numbers

There can exist also multiple isoforms, which have similar accession numbers but postfixed with dash+integer, like so: P68250-3

Replies are listed 'Best First'.
Re^2: Potential hash key ids limitations
by karlgoethebier (Abbot) on Jan 19, 2023 at 17:44 UTC
    «… already a unique UniProt identifier…»

    No joke, in fact. You always experience real surprises here - unbelievable.

    «The Crux of the Biscuit is the Apostrophe»

        I see and marvel. Thanks. Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»