in reply to Re: Using indexing for faster lookup in large file
in thread Using indexing for faster lookup in large file

Hi, and thanks for your reply.

The data isn't quite representable, and that was perhaps a bit stupid of me.A more proper representation is below (3 random lines).
The data is sorted lexicographically on the first number.
106896752;384407;root;cellular organisms;Eukaryota;Viridiplantae;Strep +tophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermat +ophyta;Magnoliophyta;Mesangiospermae;eudicotyledons;Gunneridae;Pentap +etalae;rosids;fabids;Fabales;Fabaceae;Papilionoideae;Genisteae;Lupinu +s;Lupinus magnistipulatus; 124405058;5888;root;cellular organisms;Eukaryota;Alveolata;Ciliophora; +Intramacronucleata;Oligohymenophorea;Peniculida;Parameciidae;Parameci +um;Paramecium tetraurelia; 134053560;349161;root;cellular organisms;Bacteria;Firmicutes;Clostridi +a;Clostridiales;Peptococcaceae;Desulfotomaculum;Desulfotomaculum redu +cens;Desulfotomaculum reducens MI-1;
In total there is about 160 million records

Replies are listed 'Best First'.
Re^3: Using indexing for faster lookup in large file
by BrowserUk (Patriarch) on Feb 27, 2015 at 23:48 UTC
    The data is sorted lexicographically on the first number.

    You mean like this?

    C:\Users\HomeAdmin>perl -E"@s = 1..30; say for sort @s" 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 30 4 5 6 7 8 9

    Also, what are the smallest and largest keys ("first numbers") in the file?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked