The fact that your sets are grouped is a great benefit. We can work with a set at a time.
The fact that your sets are of varying length is a great hindrance. Work needs to be done to locate the end of each set.
I made the following assumptions:
- m (and thus j) is rather small. Specifically, keeping m lines in memory is not a problem. ( Confirmed in "Update 2". )
- m is not the same for every set. ( Confirmed in "Update 2". )
- You don't want the same random j lines from every set. It's a minor change if you do.
- You don't care if the random j lines are in their original order. It's a minor change if you do.
My solution:
use strict;
use warnings;
use List::Util qw( shuffle );
my $j = 90;
sub extract_id {
my ($line) = @_;
...
return ...;
}
my @m;
my $id;
my $last_id;
for (;;) {
my $line = <DATA>;
$id = extract_id($line)
if defined($line);
if (@m) {
if (!defined($line) || $id ne $last_id) {
my $j = $j < @m ? $j : @m;
print $m[$_] foreach (shuffle(0..$#m))[0..$j-1];
@m = ();
}
}
last if !defined($line);
push(@m, $line);
$last_id = $id;
}
Untested. (Update: Tested. Fixed. )
Memory can be saved by stored file positions in @m instead of the actual lines, but that's not needed based on your "Update 2".
Alternative:
print splice(@m, rand(@m), 1) while $j--;
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.