Re: reading a random line from a file
by belg4mit (Prior) on Nov 17, 2002 at 17:29 UTC
|
| [reply] |
|
srand;
rand($.) < 1 && ($line = $_) while <>;
This has a significant advantage in space over reading the whole file in. A simple proof by induction is available upon request if you doubt the algorithm's correctness.
| [reply] [d/l] |
|
Scalability? If the OP only wants to ever read one random line from the file, the FAQ solution is fine.
If however, he wishes to read a second or subsequent random line, the FAQ solution is far from efficient.
Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.
| [reply] |
|
| [reply] |
Re: reading a random line from a file
by BrowserUk (Patriarch) on Nov 17, 2002 at 17:30 UTC
|
If you can ensure that each line is of a constant length by space padding to length of the longest pice of data, you could use seek to move directly to the line you pick randomly.
A second way would be to pre-read the file line by line and record the start position of each line in an array. You could do this using something like
my @positions = (0);
open FILE, '<', 'file' or die $!;
while(<FILE>) {
push @positions, tell FILE;
}
You then select your line by picking a random position from the array, seek to it and read your data.
my $randData = do{ seek FILE, $positions[rand @positions], 0; <FILE>;
+}
Note: Untested code. Read the docs and check the syntax etc.
Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
Just be grateful that you arrived just as the tornado season finished. Them buggers are real work. | [reply] [d/l] [select] |
Re: reading a random line from a file
by fruiture (Curate) on Nov 17, 2002 at 17:32 UTC
|
open my $fh,'<',$some_file or die $!;
my $size = -s $fh;
seek $fh, rand($size) , 0;
<$fh>; #throw away current line;
my $randomline = <$fh>;
close $fh;
Only problem is that you might seek() to a position in the last line of the file and will get no random line. But it's not too hard to solve this problem.
--
http://fruiture.de | [reply] [d/l] |
|
a
bcdefghijklmnopqrstuvwxy
z
Most random seeks will end up in the second line. If you move backwards, you'll almost always pick the long line. If you move forwards, you'll almost always pick 'z'. One way or the other, your distribution isn't random at all but determined by line length.
BrowserUk had the good sense to point out that the lines would have to be padded out to equal length to use this method.
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] |
Re: reading a random line from a file
by broquaint (Abbot) on Nov 17, 2002 at 17:33 UTC
|
Just pick a random position in the file and seek() backwards until you find the line separator or the beginning of the file e.g
open(my $f, $your_file) or die("ack: $!");
seek($f, int rand(-s $f), 0);
{
local $/ = \1;
## anyone care to optimise this?
seek($f, -2, 1) until ($c = <$f>) eq "\n" or tell($f) == 0;
}
print scalar <$f>;
That should print out a random line in a file without having to do a slurp.
HTH
_________ broquaint | [reply] [d/l] |
Re: reading a random line from a file
by sauoq (Abbot) on Nov 17, 2002 at 22:47 UTC
|
If you have to pick a random line from the file many times over the FAQ answer is not very good as it would require reading the file once for each pick. If the file is big enough that slurping it isn't desirable, then using the second method BrowserUk suggested makes a lot of sense.
I'd probably use Tie::File though. It would be easier and likely almost if not just as efficient because Dominus went to some length to make it fast.
#!/usr/bin/perl -w
use strict;
use Tie::File;
my @lines;
my $o = tie @lines, 'Tie::File', 'somefile.txt' or die $!;
print "$lines[rand @lines]\n" for 1..10;
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] |
Probablilty of reading a random line from a file
by UnderMine (Friar) on Nov 17, 2002 at 20:14 UTC
|
The two major methods :-
Prefetch the line locations and then randomly pick one has an even distribution so the chance of picking a short like is the same as a long line.
Randomly picking a possition and moving to the head/end of the current record will be distorted by the length of the record. Longer records will have a higher probability o being picked than short records.
However you can combine the methods if you are using fixed length records.
my $record_length=200; # set the record length
open my $fh,'<',$filename or die $!; # open handle
my $size = -s $fh; # get the file size
my $records=$size/$record_length; # workout the number of records
seek $fh, rand($records)*$record_length, 0;# move to a random record
my $randomline = <$fh>; # read that record
close $fh; # Close handle
Technically ypu should lock the file so it does not change beween getting its length and getting the random record but generally that is not a massive issue.
Hope this helps
UnderMine | [reply] [d/l] |
Re: reading a random line from a file
by pg (Canon) on Nov 17, 2002 at 17:34 UTC
|
One thing you can do is to seek to a random position, and then read what is between two newlines. | [reply] |