Re: unreadline function?
by davido (Cardinal) on Mar 01, 2004 at 03:38 UTC
|
I believe there's an easier solution that hasn't been mentioned yet.
You said in your question that "special tag starts record off".
That's your answer. Right now you're reading in line by line, but you should instead, read the file in record by record. That's pretty easy to do if the special tag is in some way uniform. Let's say the special tag is "<RECORD>". Set the input record separator to that instead of newline, and then read records in their entirety. At that point, if you still need to further split things down using newlines as delimeters, you can split on newline at that point. Here's how:
{
local $/ = "<RECORD>";
open INFILE, "<in.dat" or die "Bleah!\n$!";
while ( my $record = <INFILE> ) {
chomp $record; # strip off the record separator.
my @rec_lines = split /\n/, $record;
# process each record line here.
}
close INFILE;
}
I hope this helps!
| [reply] [d/l] [select] |
|
|
thank you all for your responses....i've fashioned some code as follows into a separate script that runs as the first script in a pipeline, e.g. perl pre_script <bibfile | perl processing_script. this nicely separates the complex/multipath processing away from the multiline problem. I know i could have been more concise...but this problem was all about converting the data to get a job done; not to make pretty code. Thanks once again....+es all round.
while($line=<>) {
chomp $line;
if($line=~/^\*\*\* DOCUMENT BOUNDARY \*\*\*/) {
check();
print "$line\n";
next;
}
if($line=~/^FORM=/) {
check();
print "$line\n";
next;
}
if($line=~/\.\d\d\d\./) {
check();
$started_tag = 1;
}
$tag.=$line;
}
sub check() {
if($started_tag) {
print "$tag\n";
undef $tag;
$started_tag = 0;
}
}
| [reply] [d/l] |
|
|
it could be faster if you do
if((index($line, "*** DOCUMENT BOUNDARY ***") == 0) or
(index($line, "FORM=") == 0))
{
check();
print "$line\n";
next;
} elsif($line=~/\.\d{3}./) {
# ...
| [reply] [d/l] |
|
|
oh...almost forgot....did find a IO-Unread module on CPAN...but version is 0.06
| [reply] |
|
|
| [reply] |
Re: unreadline function?
by Roger (Parson) on Mar 01, 2004 at 03:19 UTC
|
You could always use tell and seek...
my $lpos = tell(DATA); # remember where I was
while (<DATA>) {
print "$_";
if ($_ =~ /another/) { # if this line has what I want
seek DATA, $lpos, 0; # rewind to beginning of line
print <DATA>; # and read it again
}
$lpos = tell(DATA); # save offset to the beginning
# of the next line
}
__DATA__
This is a line
This is another line
| [reply] [d/l] |
|
|
how did you know about seek and tell?
| [reply] |
|
|
perldoc -f seek
perldoc -f tell
| [reply] [d/l] |
Re: unreadline function?
by esskar (Deacon) on Mar 01, 2004 at 03:25 UTC
|
one idea is to do the loop like this
my $line = '';
while(1)
{
$line = $line ? $line : <FH>;
last unless defined $line;
unless(isNewRecordLine($line))
{
addToRecord($line);
$line = '';
}
else
{
processRecord();
}
}
or you could use seek (SEEK_CUR) to get the file position information and store it in some variable and when you detect a new record, you just use seek (SEEK_SET) again to reposition the filepointer.
Have fun! | [reply] [d/l] |
|
|
I really hate unless (cond) { ... } else { ... } constructs. I think they're better written as if (not cond) { ... } else { ... }. It could be I'm an old C programmer, but I find it easier to follow.
More importantly, I think your code will fail to process the last record. I would do something like the following:
my @record;
while( <FH> ) {
chomp;
if( /start-delim/ ) {
@record and process( @record );
@record = ();
}
push @record, $_;
}
@record and process( @record );
If there was an end delimiter, the loop could be flipped around to a do/while, and one could do away with the two calls to process().
I note in passing that davido's idea of setting $/ is excellent, but it onlys work if the delimiter is a fixed string. If you need an RE to match the delimiter, a different approach is also possible.
| [reply] [d/l] [select] |
Re: unreadline function?
by dragonchild (Archbishop) on Mar 01, 2004 at 03:21 UTC
|
There is no "peek", per se. I see two options for you - neither of which is probably very appealing.
The easiest way is to create a subclass of IO::File that would cache the next line for you. So, it would always be a line ahead of where you are. Then, you would just treat that object as your file and add a call to $file->peek whenever you needed to peek at the next line.
The better way, in my humble opinion, is to rewrite your main loop as such:
- Read the line.
- See if it's part of the same record or not.
- If it isn't, then deal with the end of the record (pushing it onto a global variable or whatever)
- Clear the temporary variables used for building a given record
- Handle the line as if the above handling wasn't there.
------
We are the carpenters and bricklayers of the Information Age.
Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.
| [reply] |