Re^3: A series of random number and others

Thanks GrandFather. Your understanding is right, these lines must not be duplicate. I just began to get used to perl,and am not familiar with these modules. It seems I have to study these modules thoroughly.

As your suggestions of using hash or array, I have tried, but I encountered the memory problem. My machine can not even slurp 40 million lines into an array. That's why I create this index file first.

So, under such a condition considering memory limitation (3GB). what could be the fastest resolution for such a case? Thank you.

Comment on Re^3: A series of random number and others

Replies are listed 'Best First'.
Re^4: A series of random number and others by GrandFather (Saint) on Oct 09, 2008 at 06:07 UTC
That depends on your actual application. If you need exactly some number of lines and the distribution must be uniform then your current approach of generating an index file in some fashion seems appropriate. If that file is sorted then you can open both the index file and the data file at the same time, read the 'next' index from the index file and read lines from the data file until you reach the index, repeat until you reach the end of the index file. Consider: `use warnings; use strict; open my $rndLines, '<', "rand_sorted.txt" or die "Can't open rand_sort +ed.txt: $!"; while (defined (my $nextLine = <$rndLines>)) { chomp $nextLine; next unless $nextLine =~ /^\d+/; my $line; while (defined ($line = <>)) { last if $. >= $nextLine; } print $line if defined $line; } close $rndLines;` [download] A presume the `rand () < 0.5 and print while <>;` approximate number of lines solution I gave earlier doesn't do what you need? Perl reduces RSI - it saves typing	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^4: A series of random number and others
by GrandFather (Saint) on Oct 09, 2008 at 06:07 UTC

That depends on your actual application. If you need exactly some number of lines and the distribution must be uniform then your current approach of generating an index file in some fashion seems appropriate. If that file is sorted then you can open both the index file and the data file at the same time, read the 'next' index from the index file and read lines from the data file until you reach the index, repeat until you reach the end of the index file. Consider:

use warnings;
use strict;

open my $rndLines, '<', "rand_sorted.txt" or die "Can't open rand_sort
+ed.txt: $!";

while (defined (my $nextLine = <$rndLines>)) {
    chomp $nextLine;
    
    next unless $nextLine =~ /^\d+/;
    
    my $line;
    while (defined ($line = <>)) {
        last if $. >= $nextLine;
    }
    
    print $line if defined $line;
}

close $rndLines;
[download]

A presume the rand () < 0.5 and print while <>; approximate number of lines solution I gave earlier doesn't do what you need?

Perl reduces RSI - it saves typing

[reply]
[d/l]
[select]