Hello good Monks,

I am having difficulty parsing a huge data file. Since it is a huge file, I can only read line-by-line (with the exception of the small buffer I'm using).

The rule I am trying to implement has these preconditions (please refer to the @sourcedata array below):
  1. A literal "1" occurs at the beginning of the line, (and is the only char on that line)
  2. ... and is immediately followed by *any* number of newlines
  3. ... and is terminated by the literal "_____ 2"

IFF these conditions are met, then the newlines between the "1" and "_____ 2" lines are removed. Any other condition just prints everything (including the buffer.)

For some reason, I cannot execute the loop where I am buffering the intermediate newlines.

Also please note that for ease of this discussion, I have described this question in terms of arrays, rather than file I/O -- this is not germaine to the solution I am seeking.

Here's some sample data

#!/usr/bin/perl use strict; use warnings; # NOTE: The @sourcedata array is the representation of # the data as if it were read from a file by: # # open(FH, $sourcefilename) || die ... # my @sourcedata = <FH>; # close FH; # # Since the source file is huge, I need to process the # file line by line # NOTE: I updated this array to reflect an array of lines my @sourcedata = ( "\n", "1\n", "\n", "\n", "b\n", "\n", "1\n", "\n", "\n", "\n", "\n", "_____ 2\n", "\n", "\n", "\n" ); # The desired result of processing a small data sample: my @desiredoutput = qq( 1 b 1 _____ 2 # NOTE: preceding newlines have been collapsed );

Here's my source code...

my @buffer = (); my $length = scalar @sourcedata; for (my $I = 0; $I < $length; $I++) { my $line = $sourcedata[$I]; if ($line =~ /^1$/) { push(@buffer, $line); $I++; $line = $sourcedata[$I]; # Here's the loop I can't seem to execute: while ($line =~ /^\n$/ && $I != $length) { print "Buffering...\n"; push(@buffer, $line); $I++; last if $I == $length; $line = $sourcedata[$I]; } if ($line =~ /_____ 2/) { # print only the first and last items in the buffer, # effectively removing the empty lines print shift(@buffer), pop(@buffer); print $line; } else { print join(@buffer); } } else { print $line; } @buffer = (); }
Where do you want *them* to go today?

In reply to Implementing a parsing rule in a huge data file by thezip

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.