in reply to Slurping a large (>65 gb) into buffer
If the line separating the pages is always the same, you could set $/, aka $INPUT_RECORD_SEPARATOR (see perlvar) and read a page at a time pretty easily.
$/ = "----- PAGE SEPARATAAAR -----\n"; while (<>) { chomp; # $_ now contains the HTML page }
If the separator is something you'd have to match with a regex, you could read a line at a time and detect page boundaries.
my $page = ''; while (<>) { if ( /xxx PAGE BOUNDARY \d+ xxx/ ) { output( $page ); $page = ''; next; } $page .= $_; }
|
|---|