comment on

Another way... perhaps more suited as a tutorial than as production code:

#!/usr/bin/perl
use Modern::Perl;
# 934221

# find a string, then print the next ten chars

my @content = qw/abcdABCD1234567890xyz abcd12345ABCD0ABCD ABCD1234ABC 
+qwertyABCD1234567890/;

for my $content(@content) {
    $content =~ /.+?ABCD(.{10}).*/;
    say "Current array element is: $content";
    if ($1) {
        say "\t Next 10 char after the match: $1";
    next;
    }
}
[download]

And the result of executing this script is:

Current array element is: abcdABCD1234567890xyz
         Next 10 char after the match: 1234567890
Current array element is: abcd12345ABCD0ABCD
Current array element is: ABCD1234ABC
Current array element is: qwertyABCD1234567890
         Next 10 char after the match: 1234567890
[download]

The second and third array elements don't satisfy the regex because there are NOT 10 chars after the last instance of the sequence ABCD.

SOLVED, below Now a question for wiser heads: add, immediately after creation of the array, another element to @content, namely, $content[4] with this line: push @content, 'ABCD 123 456 789';. Run the code. This is the output from 5.012 on a win32 box:

Current array element is: abcdABCD1234567890xyz
         Next 10 char after the match: 1234567890
Current array element is: abcd12345ABCD0ABCD
Current array element is: ABCD1234ABC
Current array element is: qwertyABCD1234567890
         Next 10 char after the match: 1234567890
Current array element is: ABCD 123 456 789
[download]

Why doesn't the regex see a match in $content[4]

<UPDATE:> ~~And WTH does this minor revision,~~

my @content = qw/abcdABCD1234567890xyz abcd12345ABCD0ABCD ABCD1234ABC +qwertyABCD1234567890/; push @content, 'ABCD 123 456 789'; say "===> \$content[4]: $content[4] \n\n"; for my $content(@content) { $content =~ /.+?ABCD(.{10}).*/; say "Current array element is: $content"; if ($1) { say "\t Next 10 char after the match: $1"; }else{ say "No match on $content"; } }
[download]
~~...produce this:~~ ===> $content[4]: ABCD 123 456 789 Current array element is: abcdABCD1234567890xyz Next 10 char after the match: 1234567890 Current array element is: abcd12345ABCD0ABCD Next 10 char after the match: 1234567890 Current array element is: ABCD1234ABC Next 10 char after the match: 1234567890 Current array element is: qwertyABCD1234567890 Next 10 char after the match: 1234567890 Current array element is: ABCD 123 456 789 Next 10 char after the match: 1234567890
[download]

Duh! The answer to the stricken question is that $1 remains unchanged unless a new match is found... so its content is unchanged from the initial (successful) match when the regex fails on $content[1] and $content[2], then gets replaced (with the exact same thing) in $content[3] and remains unchanged when the regex fails on $content[4].

Update2: For a discussion of the "defensive programing" required to avoid the dumb coding in the stricken material, see What happens with empty $1 in regular expressions? (was: Regular Expression Question). The following code uses that practice, as best I understand it with regard to numbered captures:

for my $content(@content) {
    # my $match = '';  # explicit but verbose
    # my $match;         # still explicit and only slightly less verbo
+se; same effect
    my ($match) = $content =~ /.+?ABCD(.{10}).*/;  # less code; same e
+ffect
    say "Current array element is: $content";
    if ($match) {
        say "\t Next 10 char after the match: $1";
    }else{        
        say "No match on $content";
    }
}
[download]

:-) (...and apologies to all the electrons inconvenienced by the posting of the stricken part of this node)

Update 3 (10/30): Moritz pointed out that the initial code -- that which I initially tested -- failed because $1 is not reset unless there is a new match. His kind comments led me to discover that I had solved that (as in Update 2, above, o/a 10/27) and thus to get me off that track. Lo-and-behold, curing the tunnel vision led to a more open-minded review of the regex. Aha! (The explanation appears in the note, 'SOLVED!':

my @content = qw/abcdABCD1234567890xyz 
            abcd12345ABCD0ABCD 
            ABCD1234ABC 
            qwertyABCD12diff7890/;

# push @content,  'ABCD 123 456 789';       # See note "SOLVED!" below
push @content,  'x ABCD 123 456 789';       # afterthought addition 

for my $content(@content) {
    my ($match) = $content =~ /.+?ABCD(.{10}).*/;   # avoid probs w/no
+n-reset of $1
    say "Current array element is: $content";
    if ( $match ) {
        say "\t MATCH! Next 10 char after the match: $1";
    } else {        
        say "\t No match on $content";
    }
}

=head

# SOLVED! why last array element failed to match: it initially began w
+ith 'ABCD...'
# BUT the regex required something ( '.+?' ) before ' ABCD(.{10} '
# And a better fix would be to write the regex as:
#        '/.+?ABCD(.{10}).*/'
# or as: '/ABCD(.{10}).*/'

C:\>934221.pl

Current array element is: abcdABCD1234567890xyz
         MATCH! Next 10 char after the match: 1234567890
Current array element is: abcd12345ABCD0ABCD
         No match on abcd12345ABCD0ABCD
Current array element is: ABCD1234ABC
         No match on ABCD1234ABC
Current array element is: qwertyABCD12diff7890
         MATCH! Next 10 char after the match: 12diff7890
Current array element is: x ABCD 123 456 789
         MATCH! Next 10 char after the match:  123 456 7


C:\>

=cut
[download]

In reply to Re: Printing ten characters succeeding a matching string (+ followup question) by ww
in thread Printing ten characters succeeding a matching string by bluray

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.