Re: while loop logic
by Fletch (Bishop) on Jan 13, 2006 at 14:03 UTC
|
my $in_comment = undef;
while( <CMDTXTF> ) {
my @fld = split /\|/, $_;
do { $in_comment = 1; next } if $fld[5] =~ m{ /\* }x;
do { $in_comment = undef; next } if $fld[5] =~ m{ \*/ }x;
print "$fld[0] $fld[2] sequence=$fld[4] $fld[5]"
if /$regex/ and not $in_comment;
}
| [reply] [d/l] |
|
|
This is exactly the sort of situation for which a flip-flop is intended:
while( <CMDTXTF> ) {
my @fld = split /\|/, $_;
print "$fld[0] $fld[2] sequence=$fld[4] $fld[5]"
if /$regex/ and not (($fld[5] =~ m{ /\* }x)..($fld[5] =~ m{ \*/ }x
+));
}
Caution: Contents may have been coded under pressure.
| [reply] [d/l] |
|
|
I've tried out all the solution suggested in this post but there's still one problem I'm struggling to cope with. It transpires that whilst ignoring data between comments there are some cases where I need to run the regexp against what's left of the data in the record after the comments have been stripped e.g.
asdfgh|kjkhg|poioiu|ytr|kkk|aaa /* vbfew */ kkkwwwqqqsss
In this case the comment is complete within the record but I still need to check the rest of the record (aaa kkkwwwqqqsss) for any regexp matches.
In fact this issue applies throught the data sets I'm trying to deal with. It was my mistake in not explaining this properly when asking for assistance. Which by the way has been excellent. | [reply] [d/l] |
|
|
|
|
|
Re: while loop logic
by wfsp (Abbot) on Jan 13, 2006 at 14:03 UTC
|
my $comment;
while ($line = <CMDTXTF>) {
@fld = split /\|/,$line;
$comment++, next if $fld[5] =~ /\/\*/;
$comment--, next if $fld[5] =~ / \*\//;
next if $comment;
print "$fld[0] $fld[2] sequence=$fld[4] $fld[5]" if $line =~ /$rege
+xp/;
}
| [reply] [d/l] |
Re: while loop logic
by jonadab (Parson) on Jan 13, 2006 at 14:38 UTC
|
As soon as I saw a label (LOOP:) in your code, I knew we were in for something confusing. Sure enough, a couple of lines later, there's a next, but it's not just a plain next; it has to use the label, because the are intervening control-flow structures. The while loop logic in itself is not your problem. It's the logic of next that gets things all confused, and then to make it even more confusing you slap an until on the next. That's at least three levels of control-flow structures involved, which IMO is too many. (The amount of confusion surrounding next increases geometrically with the amount of control-flow structure involved.) I'm sure someone here is clever enough to iron out exactly how to make next skip the desired iterations, but I'm even more sure that there are less confusing ways to achieve what you want.
Others have suggested only printing if you aren't inside a comment. That'll work fine, and is a good way to do it. In the interest of of TMTOWTDI, another solution would be to do something like this when you detect the beginning of a comment:
{ local $/ = '*/'; <CMDTXTF>; } <CMDTXTF>;
The first diamond operator slurps everything up
to the close of the comment in one fell swoop.
The localized $/ tells it how far to go. The
braces keep the change in $/ from leaking out
to spoil the rest of your code.
The second diamond operator should clean up the
rest of the line after the */, even if it's just
a newline character, so that you're ready for
the next line when the loop resumes.
One caveat:
if any of the lines might have $/ in other fields
besides the fifth one, which don't apply to the
fifth field (e.g., perhaps the fourth field of
several consecutive records is commented out in
this fashion, with no implications for the fifth),
then this solution isn't smart enough to deal with
that correctly.
| [reply] [d/l] |
|
|
I am intrigued by this answer. Being fairly new to Perl this has opened up a new ball game for me. I have a further question based on the answer you've given.
What happens if all I want to do is to chop out the comment from the fifth field (which is the only one that will ever contain comments) but apply the regex to what remains of the field? e.g.
aaaa|bbbb|cccc|dddd|eeee|some text to check /* a comment
qqqq|wwww|eeee|rrrr|tttt| I'm not interested in */ more text to check
Can I do this using your solution ? | [reply] [d/l] |
|
|
What happens if all I want to do is to chop out the comment from the fifth field (which is the only one that will ever contain comments) but apply the regex to what remains of the field?
Potentially, if you don't care about the other fields.
What $/ does is controls where the <> file input operator stops reading. Normally it stops at the end of each line (after the newline character), but with $/ set to "*/", it will stop reading there instead. So with one read you can throw away everything up to that point, let your localized change to $/ fall out of scope at the closing brace so that the behavior of <> is back to normal, and then read the remainder of the field after the end of the comment. You'll miss fields 1-4 as well, since they were already skipped, but you can pick up the 'more text to check' part if there is something you want to do with that. My code as written calls the <> operator once more in void context to throw that and the following newline away, but you could instead assign that to a variable and do something with it.
However, if you need to save fields 1-4 from the line where the comment ends, then you need to go with the other solution, i.e., keep track of whether you're in a comment or not and act accordingly. In that case if you are in a comment and detect */ in field five you could do two things: clear the comment flag, and remove the part up to the */ from the field. Similarly, if you are not in a comment but detect /* in field 5, you could remove the part starting with /* from the field and either set the comment flag right away (if you don't want to process that line at all) or else set a different flag that causes code at the end of the loop to set the comment flag after the line is processed.
| [reply] |
|
|
Re: while loop logic
by ptum (Priest) on Jan 13, 2006 at 14:09 UTC
|
It looks to me as though you've got your 'resume' condition (the fifth field has an end-comment marker) inside the condition wherein you detect a start marker. This means that you'll only skip the first line after a start marker. I don't see a straightforward way to avoid checking the fifth field every iteration.
How 'bout something like this (untested):
my $skip = 0;
LOOP: while ($line = <CMDTXTF>) {
@fld = split /\|/,$line;
if ($fld[5] =~ / \*\//) {
$skip = 0;
# you found an end-comment, turn skipping off for the next line
next LOOP;
}
if ($fld[5] =~ /\/\*/) {
$skip = 1;
# you found a start-comment, turn skipping on
next LOOP;
}
unless ($skip) {
print "$fld[0] $fld[2] sequence=$fld[4] $fld[5]" if $line =~ /$re
+gexp/;
}
}
No good deed goes unpunished. -- (attributed to) Oscar Wilde
| [reply] [d/l] |
Re: while loop logic
by holli (Abbot) on Jan 13, 2006 at 16:08 UTC
|
use warnings;
use strict;
use File::Comments;
my $snoop = File::Comments->new( default_plugin => "File::Comments::Pl
+ugin::C");
my $strip = $snoop->stripped('yourfile');
open IN, "<", \$strip;
while ( <IN> )
{
chomp;
@_ = split /\|/;
next unless $_[5];
#Your processing here
}
| [reply] [d/l] |
Re: while loop logic
by ikegami (Patriarch) on Jan 13, 2006 at 16:46 UTC
|
You could do it in two phases:
use 5.006000; # Perl 5.6.0+
use strict;
use warnings;
use Regexp::List ();
my $txt;
{ # Remove comments.
open(my $txtfh, '<', $txtf)
or die("Can't open $txtf : $!\n");
# Read entire file into memory.
local $/;
$txt = <$txtfh>;
$txt =~ s{/\*.*?\*/}{}sg; # Remove comments.
$txt =~ s{/\*.*/}{}s; # Remve unmatched comment.
}
my @matches = ...;
my $regexp = Regexp::List
->new(modifiers => 'i', quotemeta => 0)
->list2re(@matches);
open(my $txtfh, '<', \$txt);
while (my $line = <$txtfh>) {
my @fld = split(/\|/, $line);
print "$fld[0] $fld[2] sequence=$fld[4] $fld[5]"
if $line =~ $regexp;
}
Update: Oops! holli already posted the same thing.
| [reply] [d/l] |