in reply to Printing ten characters succeeding a matching string
Another way... perhaps more suited as a tutorial than as production code:
#!/usr/bin/perl use Modern::Perl; # 934221 # find a string, then print the next ten chars my @content = qw/abcdABCD1234567890xyz abcd12345ABCD0ABCD ABCD1234ABC +qwertyABCD1234567890/; for my $content(@content) { $content =~ /.+?ABCD(.{10}).*/; say "Current array element is: $content"; if ($1) { say "\t Next 10 char after the match: $1"; next; } }
And the result of executing this script is:
Current array element is: abcdABCD1234567890xyz Next 10 char after the match: 1234567890 Current array element is: abcd12345ABCD0ABCD Current array element is: ABCD1234ABC Current array element is: qwertyABCD1234567890 Next 10 char after the match: 1234567890
The second and third array elements don't satisfy the regex because there are NOT 10 chars after the last instance of the sequence ABCD.
SOLVED, below Now a question for wiser heads: add, immediately after creation of the array, another element to @content, namely, $content[4] with this line: push @content, 'ABCD 123 456 789';. Run the code. This is the output from 5.012 on a win32 box:
Current array element is: abcdABCD1234567890xyz Next 10 char after the match: 1234567890 Current array element is: abcd12345ABCD0ABCD Current array element is: ABCD1234ABC Current array element is: qwertyABCD1234567890 Next 10 char after the match: 1234567890 Current array element is: ABCD 123 456 789
Why doesn't the regex see a match in $content[4]
<UPDATE:> And WTH does this minor revision,
my @content = qw/abcdABCD1234567890xyz abcd12345ABCD0ABCD ABCD1234ABC +qwertyABCD1234567890/; push @content, 'ABCD 123 456 789'; say "===> \$content[4]: $content[4] \n\n"; for my $content(@content) { $content =~ /.+?ABCD(.{10}).*/; say "Current array element is: $content"; if ($1) { say "\t Next 10 char after the match: $1"; }else{ say "No match on $content"; } }
===> $content[4]: ABCD 123 456 789 Current array element is: abcdABCD1234567890xyz Next 10 char after the match: 1234567890 Current array element is: abcd12345ABCD0ABCD Next 10 char after the match: 1234567890 Current array element is: ABCD1234ABC Next 10 char after the match: 1234567890 Current array element is: qwertyABCD1234567890 Next 10 char after the match: 1234567890 Current array element is: ABCD 123 456 789 Next 10 char after the match: 1234567890
Duh! The answer to the stricken question is that $1 remains unchanged unless a new match is found... so its content is unchanged from the initial (successful) match when the regex fails on $content[1] and $content[2], then gets replaced (with the exact same thing) in $content[3] and remains unchanged when the regex fails on $content[4].
Update2: For a discussion of the "defensive programing" required to avoid the dumb coding in the stricken material, see What happens with empty $1 in regular expressions? (was: Regular Expression Question). The following code uses that practice, as best I understand it with regard to numbered captures:
for my $content(@content) { # my $match = ''; # explicit but verbose # my $match; # still explicit and only slightly less verbo +se; same effect my ($match) = $content =~ /.+?ABCD(.{10}).*/; # less code; same e +ffect say "Current array element is: $content"; if ($match) { say "\t Next 10 char after the match: $1"; }else{ say "No match on $content"; } }
:-) (...and apologies to all the electrons inconvenienced by the posting of the stricken part of this node)
Update 3 (10/30): Moritz pointed out that the initial code -- that which I initially tested -- failed because $1 is not reset unless there is a new match. His kind comments led me to discover that I had solved that (as in Update 2, above, o/a 10/27) and thus to get me off that track. Lo-and-behold, curing the tunnel vision led to a more open-minded review of the regex. Aha! (The explanation appears in the note, 'SOLVED!':
my @content = qw/abcdABCD1234567890xyz abcd12345ABCD0ABCD ABCD1234ABC qwertyABCD12diff7890/; # push @content, 'ABCD 123 456 789'; # See note "SOLVED!" below push @content, 'x ABCD 123 456 789'; # afterthought addition for my $content(@content) { my ($match) = $content =~ /.+?ABCD(.{10}).*/; # avoid probs w/no +n-reset of $1 say "Current array element is: $content"; if ( $match ) { say "\t MATCH! Next 10 char after the match: $1"; } else { say "\t No match on $content"; } } =head # SOLVED! why last array element failed to match: it initially began w +ith 'ABCD...' # BUT the regex required something ( '.+?' ) before ' ABCD(.{10} ' # And a better fix would be to write the regex as: # '/.+?ABCD(.{10}).*/' # or as: '/ABCD(.{10}).*/' C:\>934221.pl Current array element is: abcdABCD1234567890xyz MATCH! Next 10 char after the match: 1234567890 Current array element is: abcd12345ABCD0ABCD No match on abcd12345ABCD0ABCD Current array element is: ABCD1234ABC No match on ABCD1234ABC Current array element is: qwertyABCD12diff7890 MATCH! Next 10 char after the match: 12diff7890 Current array element is: x ABCD 123 456 789 MATCH! Next 10 char after the match: 123 456 7 C:\> =cut
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Printing ten characters succeeding a matching string (+ followup question)
by bluray (Sexton) on Oct 27, 2011 at 21:23 UTC | |
by Caio (Acolyte) on Oct 28, 2011 at 10:29 UTC |