perlaintdead has asked for the wisdom of the Perl Monks concerning the following question:
Greetings oh wise monks! I downloaded the whole of Wikipedia and i have built an index of every entry. I built the index with the following code
use strict; use warnings; open(WIKI, "<", "F:/wiki/enwiki-20130102-pages-articles.xml"); open(INDEX, "+<", "F:/wiki/wiki.index"); my $entry; my $title; while(<WIKI>){ if( (index $_,"<title>") > -1 ){ $title = $_; $title =~ s/.*?<title>//; $title =~ s/<\/title>.*?//; $entry = $title . "::" . $. . "\n"; syswrite INDEX, $entry; print "line ", $. , " : $title done\n"; } } close(INDEX); close(WIKI);
so each entry begins with line the title was found, "::", and then the title name. My question is how would i "jump" to a specific line without having to rifle through every line of the file. I am familiar with such things like Binary searches and would also like to implement search functionality (but that's not very relevant to the question)
any help would be appreciated.
update: The index just finished up and it ended up being almost half of a Gigabyte
update:Turns out i put the vars in the wrong places with the index code. no trubles. notepad++ has regexs
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Read specific line(s)
by MidLifeXis (Monsignor) on Dec 12, 2013 at 13:52 UTC | |
by perlaintdead (Scribe) on Dec 12, 2013 at 13:56 UTC | |
|
Re: Read specific line(s)
by ww (Archbishop) on Dec 12, 2013 at 17:23 UTC | |
by BrowserUk (Patriarch) on Dec 12, 2013 at 18:20 UTC | |
by ww (Archbishop) on Dec 12, 2013 at 22:31 UTC |