Re: Printing ten characters succeeding a matching string (+ followup question)

Another way... perhaps more suited as a tutorial than as production code:

#!/usr/bin/perl
use Modern::Perl;
# 934221

# find a string, then print the next ten chars

my @content = qw/abcdABCD1234567890xyz abcd12345ABCD0ABCD ABCD1234ABC 
+qwertyABCD1234567890/;

for my $content(@content) {
    $content =~ /.+?ABCD(.{10}).*/;
    say "Current array element is: $content";
    if ($1) {
        say "\t Next 10 char after the match: $1";
    next;
    }
}
[download]

And the result of executing this script is:

Current array element is: abcdABCD1234567890xyz
         Next 10 char after the match: 1234567890
Current array element is: abcd12345ABCD0ABCD
Current array element is: ABCD1234ABC
Current array element is: qwertyABCD1234567890
         Next 10 char after the match: 1234567890
[download]

The second and third array elements don't satisfy the regex because there are NOT 10 chars after the last instance of the sequence ABCD.

SOLVED, below Now a question for wiser heads: add, immediately after creation of the array, another element to @content, namely, $content[4] with this line: push @content, 'ABCD 123 456 789';. Run the code. This is the output from 5.012 on a win32 box:

Current array element is: abcdABCD1234567890xyz
         Next 10 char after the match: 1234567890
Current array element is: abcd12345ABCD0ABCD
Current array element is: ABCD1234ABC
Current array element is: qwertyABCD1234567890
         Next 10 char after the match: 1234567890
Current array element is: ABCD 123 456 789
[download]

Why doesn't the regex see a match in $content[4]

<UPDATE:> ~~And WTH does this minor revision,~~

my @content = qw/abcdABCD1234567890xyz abcd12345ABCD0ABCD ABCD1234ABC +qwertyABCD1234567890/; push @content, 'ABCD 123 456 789'; say "===> \$content[4]: $content[4] \n\n"; for my $content(@content) { $content =~ /.+?ABCD(.{10}).*/; say "Current array element is: $content"; if ($1) { say "\t Next 10 char after the match: $1"; }else{ say "No match on $content"; } }
[download]
~~...produce this:~~ ===> $content[4]: ABCD 123 456 789 Current array element is: abcdABCD1234567890xyz Next 10 char after the match: 1234567890 Current array element is: abcd12345ABCD0ABCD Next 10 char after the match: 1234567890 Current array element is: ABCD1234ABC Next 10 char after the match: 1234567890 Current array element is: qwertyABCD1234567890 Next 10 char after the match: 1234567890 Current array element is: ABCD 123 456 789 Next 10 char after the match: 1234567890
[download]

Duh! The answer to the stricken question is that $1 remains unchanged unless a new match is found... so its content is unchanged from the initial (successful) match when the regex fails on $content[1] and $content[2], then gets replaced (with the exact same thing) in $content[3] and remains unchanged when the regex fails on $content[4].

Update2: For a discussion of the "defensive programing" required to avoid the dumb coding in the stricken material, see What happens with empty $1 in regular expressions? (was: Regular Expression Question). The following code uses that practice, as best I understand it with regard to numbered captures:

for my $content(@content) {
    # my $match = '';  # explicit but verbose
    # my $match;         # still explicit and only slightly less verbo
+se; same effect
    my ($match) = $content =~ /.+?ABCD(.{10}).*/;  # less code; same e
+ffect
    say "Current array element is: $content";
    if ($match) {
        say "\t Next 10 char after the match: $1";
    }else{        
        say "No match on $content";
    }
}
[download]

:-) (...and apologies to all the electrons inconvenienced by the posting of the stricken part of this node)

Update 3 (10/30): Moritz pointed out that the initial code -- that which I initially tested -- failed because $1 is not reset unless there is a new match. His kind comments led me to discover that I had solved that (as in Update 2, above, o/a 10/27) and thus to get me off that track. Lo-and-behold, curing the tunnel vision led to a more open-minded review of the regex. Aha! (The explanation appears in the note, 'SOLVED!':

my @content = qw/abcdABCD1234567890xyz 
            abcd12345ABCD0ABCD 
            ABCD1234ABC 
            qwertyABCD12diff7890/;

# push @content,  'ABCD 123 456 789';       # See note "SOLVED!" below
push @content,  'x ABCD 123 456 789';       # afterthought addition 

for my $content(@content) {
    my ($match) = $content =~ /.+?ABCD(.{10}).*/;   # avoid probs w/no
+n-reset of $1
    say "Current array element is: $content";
    if ( $match ) {
        say "\t MATCH! Next 10 char after the match: $1";
    } else {        
        say "\t No match on $content";
    }
}

=head

# SOLVED! why last array element failed to match: it initially began w
+ith 'ABCD...'
# BUT the regex required something ( '.+?' ) before ' ABCD(.{10} '
# And a better fix would be to write the regex as:
#        '/.+?ABCD(.{10}).*/'
# or as: '/ABCD(.{10}).*/'

C:\>934221.pl

Current array element is: abcdABCD1234567890xyz
         MATCH! Next 10 char after the match: 1234567890
Current array element is: abcd12345ABCD0ABCD
         No match on abcd12345ABCD0ABCD
Current array element is: ABCD1234ABC
         No match on ABCD1234ABC
Current array element is: qwertyABCD12diff7890
         MATCH! Next 10 char after the match: 12diff7890
Current array element is: x ABCD 123 456 789
         MATCH! Next 10 char after the match:  123 456 7


C:\>

=cut
[download]

Comment on Re: Printing ten characters succeeding a matching string (+ followup question) Select or Download Code

Replies are listed 'Best First'.
Re^2: Printing ten characters succeeding a matching string (+ followup question) by bluray (Sexton) on Oct 27, 2011 at 21:23 UTC
Hello ww, Thanks for your input. Though, I was able to compile it using the previous replies, I still has to create a header in the output file. The input file doesn't have header and it starts from the first line. All the input files I have worked before have some headings, so it was easy to reference it or add a new columnheader. I am also thinking about getting the frequency of the ten characters. That is, if there are more than one matching ten character list, I would like to print it only once and on the second column, the number of times it was found.	[reply]
Re^3: Printing ten characters succeeding a matching string (+ followup question) by Caio (Acolyte) on Oct 28, 2011 at 10:29 UTC
if you want to count and calculate frquencies for the 10 characters after matches i'd sugest you make a hash, and populate it dynamically, with some: `if ($1) { say "\t Next 10 char after the match: $1"; $hash{$1}++; next; }` [download] just a slight modification... ;) update: Corrected typo pointed out by roboticus. Thanks ;)	[reply] [d/l]