random search in file

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: random search in file by GrandFather (Saint) on Oct 20, 2008 at 22:09 UTC
Give us a little more to go on. Have you multiple searches to do? How big is the file? Are the lines fixed length? Is the search time critical? What is the likely life time of the script? Unsurprisingly this is a fairly frequently asked question so Super Search is your friend. A couple of searches turns up Specified Line Searching in file! and Picking Random Lines from a File (among many other nodes) that may be of help. Perl reduces RSI - it saves typing	[reply]
Re: random search in file by ikegami (Patriarch) on Oct 20, 2008 at 22:13 UTC
There's no way to know where a line starts, so you have to read the file until you find the line. `my $id = 'data4'; my $result; while (<$fh>) { /^\Q$id\E\s/ or next; ($result) = /\s(\S+)$/; last; } die "not found" if !defined($result);` [download]	[reply] [d/l]
Re: random search in file by swampyankee (Parson) on Oct 20, 2008 at 22:19 UTC
If the file is little (a few megabytes, but ymmv), just read it into an array, and split the lines as needed. If it's too large for that, try Tie::File, and if it's humongous, it's time for a database. In any of these cases, you can go directly (this is why I prefer the term "direct access" to "random access") to a specific line. That is, of course, if I'm interpreting your posting correctly: go to line #n and read a specific value. Information about American English usage here and here. Floating point issues? Please read this before posting. — emc	[reply]
Re: random search in file by billward (Initiate) on Oct 20, 2008 at 23:11 UTC
Is it a one-time search? If so, read one line at a time, look for the pattern you want, and then extract the last value. The above comment by ikegami would be a good way of doing that. If you need to repeatedly search for different lines, then you would need to do something more like what swampyankee suggests. Or you could build a hash with the first column as keys (assuming they're unique) and the last column as values, like this: `while(<FILE>) { my($key, $data) = /^(\S+).*\s(\S+)$/; $hash{$key} = $data; }` [download]	[reply] [d/l]
Re: random search in file by luckypower (Beadle) on Oct 21, 2008 at 05:31 UTC
hi, you can make hash from reading the file line by line. `my $hash; foreach my $line (<FH>) { $line =~ /^(\w+)\s(.+)/; }` [download] so you get first field(data1, data2,..) in $1 and all the data in $2 variable. so u can split the $2 so u will get array. `my @arr = split(" ", $2);` now use this to create the hash `$hash{$1} = [@arr];` now to get the last value you can use `pop` `pop @{$hash{"data4"}};`	[reply] [d/l] [select]
Re: random search in file by Perlbotics (Archbishop) on Oct 21, 2008 at 20:29 UTC
... how to go to directly that line ... Inspired this seek-by-key approach: You could create a simple index and then seek to the beginning of a line in order to re-read it. It might be suitable in a situation where you can keep the index in memory but not the whole file. But since the ratio of key-length to line-length is only approx. 1:5, (the smaller the ratio the better), there is probably no real gain for the given example data... use strict; my %idx; # create index... while (<DATA>) { last if /# Prints/; # for this demo $idx{$1} = (tell(DATA) - length) if /^\s*(\S+)/; } # E.g., access each line in "natural" hash order (more/less random)... print "offset: <key> <line>\n"; foreach (keys %idx) { seek(DATA, $idx{$_}, 0); chomp (my $line = <DATA>); printf("%6d: %-7s <%s>\n", $idx{$_}, "<$_>", $line); } __DATA__ data1 122 1223 12223 12223 data2 12122 12223 122223 122223 data3 13422 134223 4512223 982223 data4 23432 3432 234234 789879 data5 5635 9786 23423 2323423 # Prints offset: <key> <line> 486: <data4> < data4 23432 3432 234234 789879> 418: <data2> < data2 12122 12223 122223 122223> 451: <data3> < data3 13422 134223 4512223 982223> 518: <data5> < data5 5635 9786 23423 2323423> 390: <data1> < data1 122 1223 12223 12223> [download]	[reply] [d/l]