in reply to unreadline function?

I believe there's an easier solution that hasn't been mentioned yet.

You said in your question that "special tag starts record off".

That's your answer. Right now you're reading in line by line, but you should instead, read the file in record by record. That's pretty easy to do if the special tag is in some way uniform. Let's say the special tag is "<RECORD>". Set the input record separator to that instead of newline, and then read records in their entirety. At that point, if you still need to further split things down using newlines as delimeters, you can split on newline at that point. Here's how:

{ local $/ = "<RECORD>"; open INFILE, "<in.dat" or die "Bleah!\n$!"; while ( my $record = <INFILE> ) { chomp $record; # strip off the record separator. my @rec_lines = split /\n/, $record; # process each record line here. } close INFILE; }

I hope this helps!


Dave

Replies are listed 'Best First'.
Re: Re: unreadline function?
by aquarium (Curate) on Mar 01, 2004 at 04:38 UTC
    thank you all for your responses....i've fashioned some code as follows into a separate script that runs as the first script in a pipeline, e.g. perl pre_script <bibfile | perl processing_script. this nicely separates the complex/multipath processing away from the multiline problem. I know i could have been more concise...but this problem was all about converting the data to get a job done; not to make pretty code. Thanks once again....+es all round.
    while($line=<>) { chomp $line; if($line=~/^\*\*\* DOCUMENT BOUNDARY \*\*\*/) { check(); print "$line\n"; next; } if($line=~/^FORM=/) { check(); print "$line\n"; next; } if($line=~/\.\d\d\d\./) { check(); $started_tag = 1; } $tag.=$line; } sub check() { if($started_tag) { print "$tag\n"; undef $tag; $started_tag = 0; } }
      it could be faster if you do
      if((index($line, "*** DOCUMENT BOUNDARY ***") == 0) or (index($line, "FORM=") == 0)) { check(); print "$line\n"; next; } elsif($line=~/\.\d{3}./) { # ...
      oh...almost forgot....did find a IO-Unread module on CPAN...but version is 0.06
Re: Re: unreadline function?
by esskar (Deacon) on Mar 01, 2004 at 03:46 UTC
    pretty nice!