Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^2: Using indexing for faster lookup in large file

by anli_ (Novice)
on Feb 27, 2015 at 23:12 UTC ( [id://1118135]=note: print w/replies, xml ) Need Help??


in reply to Re: Using indexing for faster lookup in large file
in thread Using indexing for faster lookup in large file

Hi, and thanks for taking the time.
I've included 200 random lines below
106896752;384407;root;cellular organisms;Eukaryota;Viridiplantae;Strep +tophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermat +ophyta;Magnoliophyta;Mesangiospermae;eudicotyledons;Gunneridae;Pentap +etalae;rosids;fabids;Fabales;Fabaceae;Papilionoideae;Genisteae;Lupinu +s;Lupinus magnistipulatus; 124405058;5888;root;cellular organisms;Eukaryota;Alveolata;Ciliophora; +Intramacronucleata;Oligohymenophorea;Peniculida;Parameciidae;Parameci +um;Paramecium tetraurelia; 134053560;349161;root;cellular organisms;Bacteria;Firmicutes;Clostridi +a;Clostridiales;Peptococcaceae;Desulfotomaculum;Desulfotomaculum redu +cens;Desulfotomaculum reducens MI-1; 134770321;408172;root;unclassified sequences;metagenomes;ecological me +tagenomes;marine metagenome; 134983127;408172;root;unclassified sequences;metagenomes;ecological me +tagenomes;marine metagenome; 135824808;408172;root;unclassified sequences;metagenomes;ecological me +tagenomes;marine metagenome; 136673613;408172;root;unclassified sequences;metagenomes;ecological me +tagenomes;marine metagenome; 137334024;408172;root;unclassified sequences;metagenomes;ecological me +tagenomes;marine metagenome; 140718400;408172;root;unclassified sequences;metagenomes;ecological me +tagenomes;marine metagenome; 142733873;408172;root;unclassified sequences;metagenomes;ecological me +tagenomes;marine metagenome; 144068143;408172;root;unclassified sequences;metagenomes;ecological me +tagenomes;marine metagenome; 145690735;391296;root;cellular organisms;Bacteria;Firmicutes;Bacilli;L +actobacillales;Streptococcaceae;Streptococcus;Streptococcus suis;Stre +ptococcus suis 98HAH33; 150838514;400668;root;cellular organisms;Bacteria;Proteobacteria;Gamma +proteobacteria;Oceanospirillales;Oceanospirillaceae;Marinomonas;Marin +omonas sp. MWYL1; 150865752;322104;root;cellular organisms;Eukaryota;Opisthokonta;Fungi; +Dikarya;Ascomycota;saccharomyceta;Saccharomycotina;Saccharomycetes;Sa +ccharomycetales;Debaryomycetaceae;Scheffersomyces;Scheffersomyces sti +pitis;Scheffersomyces stipitis CBS 6054; 164456855;10941;root;Viruses;dsRNA viruses;Reoviridae;Sedoreovirinae;R +otavirus;Rotavirus A;Human rotavirus A; 167698973;449673;root;cellular organisms;Bacteria;Bacteroidetes/Chloro +bi group;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacte +roides;Bacteroides stercoris;Bacteroides stercoris ATCC 43183; 183175365;216594;root;cellular organisms;Bacteria;Actinobacteria;Actin +obacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobact +eriaceae;Mycobacterium;Mycobacterium marinum;Mycobacterium marinum M; 189914716;537457;root;cellular organisms;Bacteria;Proteobacteria;Gamma +proteobacteria;Pasteurellales;Pasteurellaceae;Actinobacillus;Actinoba +cillus pleuropneumoniae;Actinobacillus pleuropneumoniae serovar 7;Act +inobacillus pleuropneumoniae serovar 7 str. AP76; 190628244;7217;root;cellular organisms;Eukaryota;Opisthokonta;Metazoa; +Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Ma +ndibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera +;Endopterygota;Diptera;Brachycera;Muscomorpha;Eremoneura;Cyclorrhapha +;Schizophora;Acalyptratae;Ephydroidea;Drosophilidae;Drosophilinae;Dro +sophilini;Drosophilina;Drosophiliti;Drosophila;Sophophora;melanogaste +r group;ananassae subgroup;ananassae species complex;Drosophila anana +ssae; 219881049;9606;root;cellular organisms;Eukaryota;Opisthokonta;Metazoa; +Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnatho +stomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Te +trapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglir +es;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;H +omininae;Homo;Homo sapiens; 226184868;234621;root;cellular organisms;Bacteria;Actinobacteria;Actin +obacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Nocardia +ceae;Rhodococcus;Rhodococcus erythropolis;Rhodococcus erythropolis PR +4; 226722869;502800;root;cellular organisms;Bacteria;Proteobacteria;Gamma +proteobacteria;Enterobacteriales;Enterobacteriaceae;Yersinia;Yersinia + pseudotuberculosis;Yersinia pseudotuberculosis YPIII; 256738591;563193;root;cellular organisms;Bacteria;Bacteroidetes/Chloro +bi group;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;P +arabacteroides;Parabacteroides sp. D13; 259697659;272844;root;cellular organisms;Archaea;Euryarchaeota;Thermoc +occi;Thermococcales;Thermococcaceae;Pyrococcus;Pyrococcus abyssi;Pyro +coccus abyssi GE5; 268622796;528358;root;cellular organisms;Bacteria;Proteobacteria;Betap +roteobacteria;Neisseriales;Neisseriaceae;Neisseria;Neisseria gonorrho +eae;Neisseria gonorrhoeae PID332; 283552125;684497;root;Viruses;ssRNA viruses;ssRNA negative-strand viru +ses;Orthomyxoviridae;Influenzavirus A;Influenza A virus;H5N1 subtype; +Influenza A virus (A/chicken/Nigeria/08RS848-4/2006(H5N1)); 285308756;672713;root;cellular organisms;Eukaryota;Opisthokonta;Metazo +a;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda; +Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neopte +ra;Endopterygota;Coleoptera;Polyphaga;Cucujiformia;Curculionoidea;Cur +culionidae;Molytinae;Cleogonini;Rhyssomatus;Rhyssomatus sp. SPN-001; 289191770;644281;root;cellular organisms;Archaea;Euryarchaeota;Methano +cocci;Methanococcales;Methanocaldococcaceae;Methanocaldococcus;Methan +ocaldococcus sp. FS406-22; 296047827;536227;root;cellular organisms;Bacteria;Firmicutes;Clostridi +a;Clostridiales;Clostridiaceae;Clostridium;Clostridium carboxidivoran +s;Clostridium carboxidivorans P7; 296153125;703612;root;cellular organisms;Bacteria;Firmicutes;Bacilli;B +acillales;Bacillaceae;Bacillus;Bacillus subtilis group;Bacillus subti +lis;Bacillus subtilis subsp. spizizenii;Bacillus subtilis subsp. spiz +izenii ATCC 6633; 323718996;911237;root;cellular organisms;Bacteria;Actinobacteria;Actin +obacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobact +eriaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacteri +um tuberculosis;Mycobacterium tuberculosis CDC1551A; 326536306;279006;root;Viruses;dsDNA viruses, no RNA stage;Caudovirales +;Myoviridae;Tevenvirinae;T4likevirus;Acinetobacter phage 133; 339297830;443150;root;cellular organisms;Bacteria;Actinobacteria;Actin +obacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobact +eriaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacteri +um tuberculosis;Mycobacterium tuberculosis CCDC5180; 358071496;1055524;root;cellular organisms;Bacteria;Proteobacteria;Beta +proteobacteria;Burkholderiales;Burkholderiaceae;Burkholderia;Burkhold +eria cepacia complex;Burkholderia cenocepacia;Burkholderia cenocepaci +a H111; 374891102;1140654;root;cellular organisms;Eukaryota;Opisthokonta;Metaz +oa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda +;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neopt +era;Endopterygota;Amphiesmenoptera;Lepidoptera;Glossata;Neolepidopter +a;Heteroneura;Ditrysia;Gelechioidea;Elachistidae;Stenomatinae;Antaeot +richa;Antaeotricha renselariana; 379588058;760820;root;cellular organisms;Bacteria;Firmicutes;Bacilli;L +actobacillales;Streptococcaceae;Streptococcus;Streptococcus pneumonia +e;Streptococcus pneumoniae GA44128; 380562626;887299;root;cellular organisms;Bacteria;Proteobacteria;delta +/epsilon subdivisions;Epsilonproteobacteria;Campylobacterales;Campylo +bacteraceae;Campylobacter;Campylobacter coli;Campylobacter coli 202/0 +4; 385267822;339919;root;cellular organisms;Eukaryota;Viridiplantae;Strep +tophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermat +ophyta;Magnoliophyta;Mesangiospermae;eudicotyledons;Gunneridae;Pentap +etalae;asterids;lamiids;Gentianales;Rubiaceae;Ixoroideae;Coffeeae;Cof +fea;Coffea mauritiana; 387572600;1127122;root;cellular organisms;Bacteria;Proteobacteria;delt +a/epsilon subdivisions;Epsilonproteobacteria;Campylobacterales;Helico +bacteraceae;Helicobacter;Helicobacter pylori;Helicobacter pylori XZ27 +4; 389747673;721885;root;cellular organisms;Eukaryota;Opisthokonta;Fungi; +Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes i +ncertae sedis;Russulales;Stereaceae;Stereum;Stereum hirsutum;Stereum +hirsutum FP-91666 SS1; 393121076;992098;root;cellular organisms;Bacteria;Proteobacteria;delta +/epsilon subdivisions;Epsilonproteobacteria;Campylobacterales;Helicob +acteraceae;Helicobacter;Helicobacter pylori;Helicobacter pylori Hp P- +1b; 396040489;926034;root;cellular organisms;Bacteria;Proteobacteria;Gamma +proteobacteria;Enterobacteriales;Enterobacteriaceae;Salmonella;Salmon +ella enterica;Salmonella enterica subsp. enterica;Salmonella enterica + subsp. enterica serovar Enteritidis;Salmonella enterica subsp. enter +ica serovar Enteritidis str. 77-1427; 397481412;9597;root;cellular organisms;Eukaryota;Opisthokonta;Metazoa; +Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnatho +stomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Te +trapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglir +es;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;H +omininae;Pan;Pan paniscus; 398034639;1144308;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Paenibacillaceae;Brevibacillus;Brevibacillus sp. BC25; 400405249;1199245;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;unclassified Ent +erobacteriaceae;ant, tsetse, mealybug, aphid, etc. endosymbionts;aphi +d secondary symbionts;secondary endosymbiont of Ctenarytaina eucalypt +i; 402315106;1069608;root;cellular organisms;Bacteria;Proteobacteria;Beta +proteobacteria;Neisseriales;Neisseriaceae;Neisseria;Neisseria meningi +tidis;Neisseria meningitidis 93003; 404230400;882096;root;cellular organisms;Bacteria;Firmicutes;Bacilli;B +acillales;Listeriaceae;Listeria;Listeria monocytogenes;Listeria monoc +ytogenes SLCC5850; 404528809;1163393;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomo +nas aeruginosa group;Pseudomonas aeruginosa;Pseudomonas aeruginosa AT +CC 25324; 428676914;272123;root;cellular organisms;Bacteria;Cyanobacteria;Nostoc +ales;Nostocaceae;Anabaena;Anabaena cylindrica;Anabaena cylindrica PCC + 7122; 429500426;1249621;root;cellular organisms;Bacteria;Proteobacteria;Beta +proteobacteria;Burkholderiales;Burkholderiaceae;Cupriavidus;Cupriavid +us sp. HMR-1; 431380119;1182718;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Esch +erichia coli;Escherichia coli KTE135; 440531082;1071761;root;cellular organisms;Bacteria;Chlamydiae/Verrucom +icrobia group;Chlamydiae;Chlamydiia;Chlamydiales;Chlamydiaceae;Chlamy +dia/Chlamydophila group;Chlamydia;Chlamydia trachomatis;Chlamydia tra +chomatis E/SotonE8; 446524402;1396;root;cellular organisms;Bacteria;Firmicutes;Bacilli;Bac +illales;Bacillaceae;Bacillus;Bacillus cereus group;Bacillus cereus; 451040035;1292141;root;cellular organisms;Eukaryota;Opisthokonta;Metaz +oa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gna +thostomata;Teleostomi;Euteleostomi;Actinopterygii;Actinopteri;Neopter +ygii;Teleostei;Osteoglossocephalai;Clupeocephala;Otomorpha;Ostariophy +si;Otophysa;Cypriniphysae;Cypriniformes;Cyprinoidea;Cyprinidae;Gobiob +otia;Gobiobotia naktongensis; 452961598;1278076;root;cellular organisms;Bacteria;Actinobacteria;Acti +nobacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Nocardi +aceae;Rhodococcus;Rhodococcus ruber;Rhodococcus ruber BKS 20-38; 46913848;298386;root;cellular organisms;Bacteria;Proteobacteria;Gammap +roteobacteria;Vibrionales;Vibrionaceae;Photobacterium;Photobacterium +profundum;Photobacterium profundum SS9; 469755908;1095657;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Vibrionales;Vibrionaceae;Vibrio;Vibrio cholerae;Vibri +o cholerae O1;Vibrio cholerae O1 str. NHCC-008D; 476647593;1125698;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomo +nas sp. P179; 477236524;1116170;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Esch +erichia coli;Escherichia coli P0304777.9; 477629628;1157001;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus HI168; 479543454;1160232;root;cellular organisms;Bacteria;Proteobacteria;Alph +aproteobacteria;Rhizobiales;Brucellaceae;Brucella;Brucella suis;Bruce +lla suis 63/252; 486853813;1158606;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Lactobacillales;Enterococcaceae;Enterococcus;Enterococcus asini;Enter +ococcus asini ATCC 700915; 487306098;1157487;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Lactobacillales;Enterococcaceae;Enterococcus;Enterococcus faecium;Ent +erococcus faecium EnGen0192; 488089847;1396;root;cellular organisms;Bacteria;Firmicutes;Bacilli;Bac +illales;Bacillaceae;Bacillus;Bacillus cereus group;Bacillus cereus; 489621104;1531;root;cellular organisms;Bacteria;Firmicutes;Clostridia; +Clostridiales;Lachnospiraceae;Lachnoclostridium;[Clostridium] clostri +dioforme; 490122440;2055;root;cellular organisms;Bacteria;Actinobacteria;Actinob +acteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Gordoniace +ae;Gordonia;Gordonia terrae; 490653205;28442;root;cellular organisms;Archaea;Euryarchaeota;Halobact +eria;Halobacteriales;Halobacteriaceae;Haloarcula;Haloarcula vallismor +tis; 49242409;282458;root;cellular organisms;Bacteria;Firmicutes;Bacilli;Ba +cillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Staph +ylococcus aureus subsp. aureus;Staphylococcus aureus subsp. aureus MR +SA252; 492458986;816;root;cellular organisms;Bacteria;Bacteroidetes/Chlorobi +group;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroi +des; 492558388;850;root;cellular organisms;Bacteria;Fusobacteria;Fusobacter +iia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;Fusobacterium mort +iferum; 493010542;63128;root;cellular organisms;Archaea;Euryarchaeota;Halobact +eria;Halobacteriales;Halobacteriaceae;Natronorubrum;Natronorubrum tib +etense; 493301692;76832;root;cellular organisms;Bacteria;Bacteroidetes/Chlorob +i group;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriace +ae;Myroides;Myroides odoratimimus; 493614004;207244;root;cellular organisms;Bacteria;Firmicutes;Clostridi +a;Clostridiales;Lachnospiraceae;Anaerostipes; 497578686;57975;root;cellular organisms;Bacteria;Proteobacteria;Betapr +oteobacteria;Burkholderiales;Burkholderiaceae;Burkholderia;pseudomall +ei group;Burkholderia thailandensis; 497880899;406124;root;cellular organisms;Bacteria;Firmicutes;Bacilli;B +acillales;Bacillaceae;Bacillus;Bacillus sp. m3-13; 498472651;1352;root;cellular organisms;Bacteria;Firmicutes;Bacilli;Lac +tobacillales;Enterococcaceae;Enterococcus;Enterococcus faecium; 500403985;1053208;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Bacillaceae;Bacillus;Bacillus cereus group;Bacillus cereus +;Bacillus cereus HuB13-1; 500681908;42879;root;cellular organisms;Archaea;Euryarchaeota;Methanoc +occi;Methanococcales;Methanococcaceae;Methanococcus;Methanococcus aeo +licus; 505284647;346;root;cellular organisms;Bacteria;Proteobacteria;Gammapro +teobacteria;Xanthomonadales;Xanthomonadaceae;Xanthomonas;Xanthomonas +citri group;Xanthomonas citri; 506269758;146826;root;cellular organisms;Archaea;Euryarchaeota;Halobac +teria;Halobacteriales;Halobacteriaceae;Halorhabdus;Halorhabdus utahen +sis; 512387990;1262999;root;cellular organisms;Bacteria;Firmicutes;environm +ental samples;Firmicutes bacterium CAG:103; 514629109;1192587;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Salmonella;Salmo +nella enterica;Salmonella enterica subsp. enterica;Salmonella enteric +a subsp. enterica serovar Enteritidis;Salmonella enterica subsp. ente +rica serovar Enteritidis str. 2009K1651; 516046151;1240676;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomo +nas sp. PAMC 26793; 516365606;1191699;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Bacillaceae;Bacillus;Bacillus sp. ZYK; 517212734;1131274;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;unclassified Gammaproteobacteria;unclassified Gammapr +oteobacteria (miscellaneous);gamma proteobacterium SCGC AB-629-P17; 518391533;196015;root;cellular organisms;Bacteria;Proteobacteria;Betap +roteobacteria;Burkholderiales;Comamonadaceae;Caldimonas;Caldimonas ma +nganoxidans; 51971731;3702;root;cellular organisms;Eukaryota;Viridiplantae;Streptop +hyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatoph +yta;Magnoliophyta;Mesangiospermae;eudicotyledons;Gunneridae;Pentapeta +lae;rosids;malvids;Brassicales;Brassicaceae;Camelineae;Arabidopsis;Ar +abidopsis thaliana; 521583151;1277687;root;cellular organisms;Eukaryota;Opisthokonta;Fungi +;Dikarya;Basidiomycota;Ustilaginomycotina;Ustilaginomycetes;Ustilagin +ales;Ustilaginaceae;mitosporic Ustilaginaceae;Pseudozyma;Pseudozyma f +locculosa;Pseudozyma flocculosa PF-1; 524662827;1263014;root;cellular organisms;Bacteria;Firmicutes;environm +ental samples;Firmicutes bacterium CAG:270; 529392420;1373546;root;cellular organisms;Eukaryota;Opisthokonta;Metaz +oa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda +;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neopt +era;Endopterygota;Amphiesmenoptera;Lepidoptera;Glossata;Neolepidopter +a;Heteroneura;Ditrysia;Obtectomera;Noctuoidea;Lymantriidae;Euproctis; +Euproctis euthysana; 530786913;1383057;root;Viruses;dsDNA viruses, no RNA stage;Caudovirale +s;Siphoviridae;unclassified Siphoviridae;Mycobacterium phage Goku; 536810625;1151226;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Esch +erichia coli;Escherichia coli B26-2; 542084436;1390361;root;cellular organisms;Bacteria;Proteobacteria;Alph +aproteobacteria;Rhizobiales;Brucellaceae;Ochrobactrum;Ochrobactrum sp +. EGD-AQ16; 547308260;1262747;root;cellular organisms;Bacteria;Bacteroidetes/Chlor +obi group;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bact +eroides;environmental samples;Bacteroides sp. CAG:702; 549122781;1219013;root;cellular organisms;Bacteria;Actinobacteria;Acti +nobacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Nocardi +aceae;Rhodococcus;Rhodococcus equi;Rhodococcus equi NBRC 101255 = C 7 +; 552352664;1402522;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomo +nas aeruginosa group;Pseudomonas aeruginosa;Pseudomonas aeruginosa BW +HPSA022; 557036471;1385625;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Chromatiales;Chromatiaceae;Thiohalocapsa;environmenta +l samples;uncultured Thiohalocapsa sp. PB-PSB1; 558184793;13735;root;cellular organisms;Eukaryota;Opisthokonta;Metazoa +;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnath +ostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;T +etrapoda;Amniota;Sauropsida;Sauria;Archelosauria;Testudines;Cryptodir +a;Trionychia;Trionychidae;Pelodiscus;Pelodiscus sinensis; 563074710;1287281;root;cellular organisms;Bacteria;Proteobacteria;Alph +aproteobacteria;Rhizobiales;Phyllobacteriaceae;Mesorhizobium;Mesorhiz +obium sp. LNJC405B00; 563166710;1287240;root;cellular organisms;Bacteria;Proteobacteria;Alph +aproteobacteria;Rhizobiales;Phyllobacteriaceae;Mesorhizobium;Mesorhiz +obium sp. LNHC229A00; 564723032;1073376;root;cellular organisms;Bacteria;Firmicutes;Clostrid +ia;Clostridiales;Ruminococcaceae;Ruminococcus;Ruminococcus lactaris;R +uminococcus lactaris CC59_002D; 572994802;672190;root;cellular organisms;Bacteria;Firmicutes;Bacilli;L +actobacillales;Streptococcaceae;Streptococcus;Streptococcus suis;Stre +ptococcus suis 05HAS68; 573045401;1379689;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Klebsiella;Klebs +iella pneumoniae;Klebsiella pneumoniae subsp. pneumoniae;Klebsiella p +neumoniae subsp. pneumoniae T69; 574104107;112090;root;cellular organisms;Eukaryota;Stramenopiles;Oomyc +etes;Saprolegniales;Saprolegniaceae;Aphanomyces;Aphanomyces astaci; 576043156;759272;root;cellular organisms;Eukaryota;Opisthokonta;Fungi; +Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;sordari +omyceta;Sordariomycetes;Sordariomycetidae;Sordariales;Chaetomiaceae;C +haetomium;Chaetomium thermophilum;Chaetomium thermophilum var. thermo +philum;Chaetomium thermophilum var. thermophilum DSM 1495; 577388362;1388514;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus M0524; 578517975;1388644;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus M0867; 580141981;1388400;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus M0604; 580467338;1409495;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus MISS6048; 580616592;1409579;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus SMMC6076; 580903490;1409729;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus CHAP6102; 581224096;1388685;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus M0914; 581436498;1392060;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus M1054; 581540625;1393714;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus M1577; 581586411;1397998;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus M1369; 582277000;1410750;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus H69239; 584078504;225400;root;cellular organisms;Eukaryota;Opisthokonta;Metazo +a;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnat +hostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha; +Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiather +ia;Chiroptera;Microchiroptera;Vespertilionidae;Myotis;Myotis davidii; 584932513;1412684;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus T16664; 584987529;1412661;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus H68244; 587429795;1417127;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus H49917; 593089099;1422369;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus DAR5804; 593275121;1421911;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus W45757; 593366026;1421898;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus W12583; 593426936;1421862;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus H75421; 595299069;1310971;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Pseudomonadales;Moraxellaceae;Acinetobacter;Acinetoba +cter calcoaceticus/baumannii complex;Acinetobacter baumannii;Acinetob +acter baumannii 25750_7; 599685217;1418294;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus F71576; 599855070;1054950;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Salmonella;Salmo +nella enterica;Salmonella enterica subsp. enterica;Salmonella enteric +a subsp. enterica serovar Heidelberg;Salmonella enterica subsp. enter +ica serovar Heidelberg str. SARA 39; 600609210;1422173;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus DAR3799; 604803403;34607;root;cellular organisms;Eukaryota;Opisthokonta;Metazoa +;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;C +helicerata;Arachnida;Acari;Parasitiformes;Ixodida;Ixodoidea;Ixodidae; +Amblyomminae;Amblyomma;Amblyomma cajennense; 606782342;1446570;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Esch +erichia coli;Escherichia coli O121:H19;Escherichia coli O121:H19 str. + 2011C-3108; 610935085;1311138;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Pseudomonadales;Moraxellaceae;Acinetobacter;Acinetoba +cter calcoaceticus/baumannii complex;Acinetobacter baumannii;Acinetob +acter baumannii 42057_6; 612869129;1413434;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus;Sta +phylococcus aureus 1111200013; 620950088;9258;root;cellular organisms;Eukaryota;Opisthokonta;Metazoa; +Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnatho +stomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Te +trapoda;Amniota;Mammalia;Prototheria;Monotremata;Ornithorhynchidae;Or +nithorhynchus;Ornithorhynchus anatinus; 621278911;1447455;root;cellular organisms;Bacteria;Actinobacteria;Acti +nobacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobac +teriaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacter +ium tuberculosis;Mycobacterium tuberculosis MD19051; 621296575;1447460;root;cellular organisms;Bacteria;Actinobacteria;Acti +nobacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobac +teriaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacter +ium tuberculosis;Mycobacterium tuberculosis MD17903; 621869142;1448502;root;cellular organisms;Bacteria;Actinobacteria;Acti +nobacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobac +teriaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacter +ium tuberculosis;Mycobacterium tuberculosis TKK_05MA_0026; 622051763;1448562;root;cellular organisms;Bacteria;Actinobacteria;Acti +nobacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobac +teriaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacter +ium tuberculosis;Mycobacterium tuberculosis TKK_04_0090; 624850951;1324221;root;cellular organisms;Bacteria;Actinobacteria;Acti +nobacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobac +teriaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacter +ium tuberculosis;Mycobacterium tuberculosis TKK-01-0004; 624946284;1324245;root;cellular organisms;Bacteria;Actinobacteria;Acti +nobacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobac +teriaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacter +ium tuberculosis;Mycobacterium tuberculosis TKK-01-0035; 625852057;1400893;root;cellular organisms;Bacteria;Actinobacteria;Acti +nobacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobac +teriaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacter +ium tuberculosis;Mycobacterium tuberculosis KT-0039; 626729949;1423483;root;cellular organisms;Bacteria;Actinobacteria;Acti +nobacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobac +teriaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacter +ium tuberculosis;Mycobacterium tuberculosis BTB08-072; 626764864;1423492;root;cellular organisms;Bacteria;Actinobacteria;Acti +nobacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobac +teriaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacter +ium tuberculosis;Mycobacterium tuberculosis BTB08-356; 627354757;1427254;root;cellular organisms;Bacteria;Actinobacteria;Acti +nobacteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobac +teriaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacter +ium tuberculosis;Mycobacterium tuberculosis XTB13-175; 633436231;1331260;root;cellular organisms;Bacteria;Proteobacteria;Beta +proteobacteria;Burkholderiales;Alcaligenaceae;Bordetella;Bordetella b +ronchiseptica;Bordetella bronchiseptica F-1; 635903490;1438693;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Esch +erichia coli;Escherichia coli UCI 57; 635969792;1438706;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Klebsiella;Klebs +iella pneumoniae;Klebsiella pneumoniae BWH 48; 635990591;1438711;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Klebsiella;Klebs +iella pneumoniae;Klebsiella pneumoniae CHS 05; 636064842;1438729;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Klebsiella;Klebs +iella pneumoniae;Klebsiella pneumoniae CHS 23; 644983342;1469486;root;cellular organisms;Eukaryota;Opisthokonta;Metaz +oa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda +;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Palae +optera;Ephemeroptera;Pisciforma;Baetidae;Baetis;Baetis cf. rhodani BR +-34-FR(AP); 645921707;632;root;cellular organisms;Bacteria;Proteobacteria;Gammapro +teobacteria;Enterobacteriales;Enterobacteriaceae;Yersinia;Yersinia pe +stis; 652046346;1444263;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia;Esch +erichia coli;Escherichia coli 2-210-07_S4_C3; 654297784;95486;root;cellular organisms;Bacteria;Proteobacteria;Betapr +oteobacteria;Burkholderiales;Burkholderiaceae;Burkholderia;Burkholder +ia cepacia complex;Burkholderia cenocepacia; 654304662;765414;root;cellular organisms;Bacteria;Proteobacteria;Betap +roteobacteria;Burkholderiales;Burkholderiaceae;Burkholderia;Burkholde +ria bannensis; 655673603;670;root;cellular organisms;Bacteria;Proteobacteria;Gammapro +teobacteria;Vibrionales;Vibrionaceae;Vibrio;Vibrio harveyi group;Vibr +io parahaemolyticus; 655935553;1927;root;cellular organisms;Bacteria;Actinobacteria;Actinob +acteria;Actinobacteridae;Actinomycetales;Streptomycineae;Streptomycet +aceae;Streptomyces;Streptomyces rimosus; 656329600;1433126;root;cellular organisms;Bacteria;Bacteroidetes/Chlor +obi group;Bacteroidetes;Bacteroidia;Bacteroidales;Rikenellaceae;Mucin +ivorans;Mucinivorans hirudinis; 658528040;110321;root;cellular organisms;Bacteria;Proteobacteria;Alpha +proteobacteria;Rhizobiales;Rhizobiaceae;Sinorhizobium/Ensifer group;S +inorhizobium;Sinorhizobium medicae; 663254050;66875;root;cellular organisms;Bacteria;Actinobacteria;Actino +bacteria;Actinobacteridae;Actinomycetales;Streptomycineae;Streptomyce +taceae;Streptomyces;Streptomyces catenulae; 663727468;47872;root;cellular organisms;Bacteria;Actinobacteria;Actino +bacteria;Actinobacteridae;Actinomycetales;Micromonosporineae;Micromon +osporaceae;Micromonospora;Micromonospora purpureochromogenes; 665448997;1765;root;cellular organisms;Bacteria;Actinobacteria;Actinob +acteria;Actinobacteridae;Actinomycetales;Corynebacterineae;Mycobacter +iaceae;Mycobacterium;Mycobacterium tuberculosis complex;Mycobacterium + bovis; 666606103;1398202;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Enterobacteriales;Enterobacteriaceae;Xenorhabdus;Xeno +rhabdus bovienii;Xenorhabdus bovienii str. oregonense; 667063294;480784;root;cellular organisms;Eukaryota;Opisthokonta;Metazo +a;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnat +hostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha; +Tetrapoda;Amniota;Sauropsida;Sauria;Lepidosauria;Squamata;Bifurcata;U +nidentata;Scinciformata;Scincidae;Sphenomorphinae;Ctenotus;Ctenotus s +paldingi; 673000630;762211;root;cellular organisms;Bacteria;Actinobacteria;Actin +obacteria;Actinobacteridae;Bifidobacteriales;Bifidobacteriaceae;Bifid +obacterium;Bifidobacterium stellenboschense; 674805746;559305;root;cellular organisms;Eukaryota;Opisthokonta;Fungi; +Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;Eurotio +mycetes;Eurotiomycetidae;Onygenales;Arthrodermataceae;Trichophyton;Tr +ichophyton rubrum;Trichophyton rubrum CBS 118892; 675336429;672;root;cellular organisms;Bacteria;Proteobacteria;Gammapro +teobacteria;Vibrionales;Vibrionaceae;Vibrio;Vibrio vulnificus; 675597565;1121013;root;cellular organisms;Bacteria;Proteobacteria;Gamm +aproteobacteria;Xanthomonadales;Xanthomonadaceae;Arenimonas;Arenimona +s composti;Arenimonas composti TR7-09 = DSM 18010; 675814043;1302649;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Lactobacillales;Enterococcaceae;Tetragenococcus;Tetragenococcus muria +ticus;Tetragenococcus muriaticus PMC-11-5; 678106523;8897;root;cellular organisms;Eukaryota;Opisthokonta;Metazoa; +Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnatho +stomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Te +trapoda;Amniota;Sauropsida;Sauria;Archelosauria;Archosauria;Dinosauri +a;Saurischia;Theropoda;Coelurosauria;Aves;Neognathae;Apodiformes;Apod +idae;Chaetura;Chaetura pelagica; 685161672;1545093;root;cellular organisms;Eukaryota;Opisthokonta;Metaz +oa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda +;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neopt +era;Paraneoptera;Hemiptera;Euhemiptera;Neohemiptera;Prosorrhyncha;Het +eroptera;Euheteroptera;Neoheteroptera;Panheteroptera;Cimicomorpha;Cim +icoidea;Miridae;Mirinae;Mirini;Charagochilus;Charagochilus weberi; 686422209;1280;root;cellular organisms;Bacteria;Firmicutes;Bacilli;Bac +illales;Staphylococcaceae;Staphylococcus;Staphylococcus aureus; 695883435;1487921;root;cellular organisms;Bacteria;Firmicutes;Clostrid +ia;Clostridiales;Clostridiaceae;Clostridium;Clostridium sp. HMP27; 696303948;1583098;root;cellular organisms;Bacteria;Fusobacteria;Fusoba +cteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;Fusobacterium +hwasookii; 700546014;1121898;root;cellular organisms;Bacteria;Bacteroidetes/Chlor +obi group;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteria +ceae;Flavobacterium;Flavobacterium subsaxonicum;Flavobacterium subsax +onicum WB 4.1-42 = DSM 21790; 703240764;2030;root;cellular organisms;Bacteria;Actinobacteria;Actinob +acteria;Actinobacteridae;Actinomycetales;Pseudonocardineae;Pseudonoca +rdiaceae;Kibdelosporangium;Kibdelosporangium aridum; 713638906;623;root;cellular organisms;Bacteria;Proteobacteria;Gammapro +teobacteria;Enterobacteriales;Enterobacteriaceae;Shigella;Shigella fl +exneri; 71558575;264730;root;cellular organisms;Bacteria;Proteobacteria;Gammap +roteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomona +s syringae group;Pseudomonas syringae group genomosp. 2;Pseudomonas s +avastanoi;Pseudomonas syringae pv. phaseolicola;Pseudomonas syringae +pv. phaseolicola 1448A; 72118286;264198;root;cellular organisms;Bacteria;Proteobacteria;Betapr +oteobacteria;Burkholderiales;Burkholderiaceae;Cupriavidus;Cupriavidus + pinatubonensis;Ralstonia eutropha JMP134; 723221091;550;root;cellular organisms;Bacteria;Proteobacteria;Gammapro +teobacteria;Enterobacteriales;Enterobacteriaceae;Enterobacter;Enterob +acter cloacae complex;Enterobacter cloacae; 728072750;35708;root;cellular organisms;Eukaryota;Viridiplantae;Strept +ophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermato +phyta;Magnoliophyta;Mesangiospermae;Liliopsida;Petrosaviidae;commelin +ids;Poales;Poaceae;PACMAD clade;Arundinoideae;Arundineae;Arundo;Arund +o donax; 728138150;35708;root;cellular organisms;Eukaryota;Viridiplantae;Strept +ophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermato +phyta;Magnoliophyta;Mesangiospermae;Liliopsida;Petrosaviidae;commelin +ids;Poales;Poaceae;PACMAD clade;Arundinoideae;Arundineae;Arundo;Arund +o donax; 728450293;35708;root;cellular organisms;Eukaryota;Viridiplantae;Strept +ophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermato +phyta;Magnoliophyta;Mesangiospermae;Liliopsida;Petrosaviidae;commelin +ids;Poales;Poaceae;PACMAD clade;Arundinoideae;Arundineae;Arundo;Arund +o donax; 729312810;28532;root;cellular organisms;Eukaryota;Viridiplantae;Strept +ophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermato +phyta;Magnoliophyta;Mesangiospermae;eudicotyledons;Gunneridae;Pentape +talae;rosids;malvids;Brassicales;Cleomaceae;Tarenaya;Tarenaya hassler +iana; 735805624;59201;root;cellular organisms;Bacteria;Proteobacteria;Gammap +roteobacteria;Enterobacteriales;Enterobacteriaceae;Salmonella;Salmone +lla enterica;Salmonella enterica subsp. enterica; 736965689;198;root;cellular organisms;Bacteria;Proteobacteria;delta/ep +silon subdivisions;Epsilonproteobacteria;Campylobacterales;Campylobac +teraceae;Campylobacter;Campylobacter hyointestinalis; 737957273;476529;root;cellular organisms;Bacteria;Proteobacteria;Alpha +proteobacteria;Rhodobacterales;Rhodobacteraceae;Leisingera;Leisingera + aquimarina; 738124599;1494961;root;cellular organisms;Bacteria;Firmicutes;Bacilli; +Bacillales;Listeriaceae;Listeria;Listeria cornellensis; 738873881;1219626;root;cellular organisms;Bacteria;Firmicutes;Clostrid +ia;Clostridiales;Peptostreptococcaceae;Peptostreptococcus;Peptostrept +ococcus sp. MV1; 739751474;405783;root;cellular organisms;Bacteria;Actinobacteria;Actin +obacteria;Actinobacteridae;Actinomycetales;Streptomycineae;Streptomyc +etaceae;Streptacidiphilus;Streptacidiphilus rugosus; 739834416;68194;root;cellular organisms;Bacteria;Actinobacteria;Actino +bacteria;Actinobacteridae;Actinomycetales;Streptomycineae;Streptomyce +taceae;Streptomyces;Streptomyces durhamensis; 740142740;680197;root;cellular organisms;Bacteria;Proteobacteria;Alpha +proteobacteria;Rhodospirillales;Rhodospirillaceae;Thalassospira;Thala +ssospira permensis; 746833659;103372;root;cellular organisms;Eukaryota;Opisthokonta;Metazo +a;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda; +Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neopte +ra;Endopterygota;Hymenoptera;Apocrita;Aculeata;Vespoidea;Formicidae;M +yrmicinae;Attini;Acromyrmex;Acromyrmex echinatior; 749922863;46506;root;cellular organisms;Bacteria;Bacteroidetes/Chlorob +i group;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacter +oides;Bacteroides stercoris; 750171272;553386;root;cellular organisms;Bacteria;Proteobacteria;Gamma +proteobacteria;Oceanospirillales;Halomonadaceae;Halomonas;Halomonas j +eotgali; 754100795;32054;root;cellular organisms;Bacteria;Cyanobacteria;Nostoca +les;Rivulariaceae;Calothrix;Calothrix parietina; 754113026;335659;root;cellular organisms;Bacteria;Proteobacteria;Alpha +proteobacteria;Rhizobiales;Bradyrhizobiaceae;Bradyrhizobium;Bradyrhiz +obium sp. S23321; 754166196;480;root;cellular organisms;Bacteria;Proteobacteria;Gammapro +teobacteria;Pseudomonadales;Moraxellaceae;Moraxella;Branhamella;Morax +ella catarrhalis; 755943753;64838;root;cellular organisms;Eukaryota;Opisthokonta;Metazoa +;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;M +andibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neopter +a;Endopterygota;Hymenoptera;Apocrita;Ichneumonoidea;Braconidae;Opiina +e;Fopius;Fopius arisanus; 755969867;64838;root;cellular organisms;Eukaryota;Opisthokonta;Metazoa +;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;M +andibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neopter +a;Endopterygota;Hymenoptera;Apocrita;Ichneumonoidea;Braconidae;Opiina +e;Fopius;Fopius arisanus; 757571032;285562;root;cellular organisms;Bacteria;Actinobacteria;Actin +obacteria;Actinobacteridae;Actinomycetales;Streptomycineae;Streptomyc +etaceae;Streptomyces;Streptomyces coelicoflavus; 93353281;266264;root;cellular organisms;Bacteria;Proteobacteria;Betapr +oteobacteria;Burkholderiales;Burkholderiaceae;Cupriavidus;Cupriavidus + metallidurans;Cupriavidus metallidurans CH34;

Replies are listed 'Best First'.
Re^3: Using indexing for faster lookup in large file
by roboticus (Chancellor) on Feb 28, 2015 at 18:14 UTC

    anli_:

    Considering that the bulk of your data appears to be the text representation of various paths through a taxonomy tree, I thought that you might be able to fit it all into memory (taking advantage of all the redundancy) if you built a pair of trees and connected them together at the leaves. For example, if your data looked like this:

    1;2;8;root;xyzzy;cat 1;2;5;root;xyzzy;dog 1;9;root;bird

    Then we could build an index tree (on top) and the taxonomy tree (below), tying them together (shown by vertical lines) like this:

    1 / \ / \ 2 \ --^- \ / \ \ 8 5 9 | | | cat dog bird \ / / xyzzy / \ / root

    If your tree is relatively shallow but broad, you should be able to save a considerable amount of space.

    Here's some code that builds the two trees and looks up some of the numeric keys to display the data. Let me know if it does the trick for you, I'd be interested in knowing how much memory it takes to hold your database.

    The taxonomy tree has parent links in it to let you get from the leaf to the root of the tree, and the traceback function will walk the tree back to the root for you.

    Update: I added a couple comments to the code to clarify it a little.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Hi. And thanks for taking the time.

      I think this is a nice approach, because like you say there is a lot of redundancy in the data.

      I'll try to have a look at this more later on, but as for now I went with the Lucy approach.

        Do you have a copy of your database somewhere online? I'd love to give it a try to see how much memory my code consumes.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1118135]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-03-29 15:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found