Printing ten characters succeeding a matching string

bluray has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Printing ten characters succeeding a matching string by roboticus (Chancellor) on Oct 27, 2011 at 19:51 UTC
bluray: Look at the "Capture Buffers" section of `perldoc perlre` and you'll see how to do it. Here's a quick example: `$ cat foo.pl my $t='the quick red fox jumped over the lazy brown dog.'; if ($t=~/fox(.{10})/) { print "The 10 characters after fox are '$1'\n"; } $ perl foo.pl The 10 characters after fox are ' jumped ov'` [download] ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l] [select]
Re: Printing ten characters succeeding a matching string by hbm (Hermit) on Oct 27, 2011 at 20:00 UTC
Unrelated to your question, dots are literal on the right side of s///; no need to escape them: `#$outfile =~ s/\.txt/\.tag\.txt/gi; $outfile =~ s/\.txt/.tag.txt/gi;` [download]	[reply] [d/l]
Re^2: Printing ten characters succeeding a matching string by bluray (Sexton) on Oct 27, 2011 at 20:22 UTC
Thanks hbm and roboticus for the suggestions.	[reply]
Re: Printing ten characters succeeding a matching string (+ followup question) by ww (Archbishop) on Oct 27, 2011 at 20:31 UTC
Another way... perhaps more suited as a tutorial than as production code: `#!/usr/bin/perl use Modern::Perl; # 934221 # find a string, then print the next ten chars my @content = qw/abcdABCD1234567890xyz abcd12345ABCD0ABCD ABCD1234ABC +qwertyABCD1234567890/; for my $content(@content) { $content =~ /.+?ABCD(.{10})./; say "Current array element is: $content"; if ($1) { say "\t Next 10 char after the match: $1"; next; } }` [download] And the result of executing this script is: `Current array element is: abcdABCD1234567890xyz Next 10 char after the match: 1234567890 Current array element is: abcd12345ABCD0ABCD Current array element is: ABCD1234ABC Current array element is: qwertyABCD1234567890 Next 10 char after the match: 1234567890` [download] The second and third array elements don't satisfy the regex because there are NOT 10 chars after the last instance of the sequence `ABCD`. SOLVED, below* Now a question for wiser heads: add, immediately after creation of the array, another element to `@content`, namely, `$content[4]` with this line: `push @content, 'ABCD 123 456 789';`. Run the code. This is the output from 5.012 on a win32 box: `Current array element is: abcdABCD1234567890xyz Next 10 char after the match: 1234567890 Current array element is: abcd12345ABCD0ABCD Current array element is: ABCD1234ABC Current array element is: qwertyABCD1234567890 Next 10 char after the match: 1234567890 Current array element is: ABCD 123 456 789` [download] Why doesn't the regex see a match in $content[4] <UPDATE:> ~~And WTH does this minor revision,~~ `my @content = qw/abcdABCD1234567890xyz abcd12345ABCD0ABCD ABCD1234ABC +qwertyABCD1234567890/; push @content, 'ABCD 123 456 789'; say "===> \$content[4]: $content[4] \n\n"; for my $content(@content) { $content =~ /.+?ABCD(.{10})./; say "Current array element is: $content"; if ($1) { say "\t Next 10 char after the match: $1"; }else{ say "No match on $content"; } }` [download] ~~...produce this:~~ `===> $content[4]: ABCD 123 456 789 Current array element is: abcdABCD1234567890xyz Next 10 char after the match: 1234567890 Current array element is: abcd12345ABCD0ABCD Next 10 char after the match: 1234567890 Current array element is: ABCD1234ABC Next 10 char after the match: 1234567890 Current array element is: qwertyABCD1234567890 Next 10 char after the match: 1234567890 Current array element is: ABCD 123 456 789 Next 10 char after the match: 1234567890` [download] Duh! The answer to the stricken question is that $1 remains unchanged unless a new match is found... so its content is unchanged from the initial (successful) match when the regex fails on $content[1] and $content[2], then gets replaced (with the exact same thing) in $content[3] and remains unchanged when the regex fails on $content[4]. Update2:* For a discussion of the "defensive programing" required to avoid the dumb coding in the stricken material, see What happens with empty $1 in regular expressions? (was: Regular Expression Question). The following code uses that practice, as best I understand it with regard to numbered captures: `for my $content(@content) { # my $match = ''; # explicit but verbose # my $match; # still explicit and only slightly less verbo +se; same effect my ($match) = $content =~ /.+?ABCD(.{10})./; # less code; same e +ffect say "Current array element is: $content"; if ($match) { say "\t Next 10 char after the match: $1"; }else{ say "No match on $content"; } }` [download] :-) (...and apologies to all the electrons inconvenienced by the posting of the stricken part of this node) Update 3 (10/30):* Moritz pointed out that the initial code -- that which I initially tested -- failed because `$1` is not reset unless there is a new match. His kind comments led me to discover that I had solved that (as in Update 2, above, o/a 10/27) and thus to get me off that track. Lo-and-behold, curing the tunnel vision led to a more open-minded review of the regex. Aha! (The explanation appears in the note, 'SOLVED!': my @content = qw/abcdABCD1234567890xyz abcd12345ABCD0ABCD ABCD1234ABC qwertyABCD12diff7890/; # push @content, 'ABCD 123 456 789'; # See note "SOLVED!" below push @content, 'x ABCD 123 456 789'; # afterthought addition for my $content(@content) { my ($match) = $content =~ /.+?ABCD(.{10})./; # avoid probs w/no +n-reset of $1 say "Current array element is: $content"; if ( $match ) { say "\t MATCH! Next 10 char after the match: $1"; } else { say "\t No match on $content"; } } =head # SOLVED! why last array element failed to match: it initially began w +ith 'ABCD...' # BUT the regex required something ( '.+?' ) before ' ABCD(.{10} ' # And a better fix would be to write the regex as: # '/.+?ABCD(.{10})./' # or as: '/ABCD(.{10}).*/' C:\>934221.pl Current array element is: abcdABCD1234567890xyz MATCH! Next 10 char after the match: 1234567890 Current array element is: abcd12345ABCD0ABCD No match on abcd12345ABCD0ABCD Current array element is: ABCD1234ABC No match on ABCD1234ABC Current array element is: qwertyABCD12diff7890 MATCH! Next 10 char after the match: 12diff7890 Current array element is: x ABCD 123 456 789 MATCH! Next 10 char after the match: 123 456 7 C:\> =cut [download]	[reply] [d/l] [select]
Re^2: Printing ten characters succeeding a matching string (+ followup question) by bluray (Sexton) on Oct 27, 2011 at 21:23 UTC
Hello ww, Thanks for your input. Though, I was able to compile it using the previous replies, I still has to create a header in the output file. The input file doesn't have header and it starts from the first line. All the input files I have worked before have some headings, so it was easy to reference it or add a new columnheader. I am also thinking about getting the frequency of the ten characters. That is, if there are more than one matching ten character list, I would like to print it only once and on the second column, the number of times it was found.	[reply]
Re^3: Printing ten characters succeeding a matching string (+ followup question) by Caio (Acolyte) on Oct 28, 2011 at 10:29 UTC
if you want to count and calculate frquencies for the 10 characters after matches i'd sugest you make a hash, and populate it dynamically, with some: `if ($1) { say "\t Next 10 char after the match: $1"; $hash{$1}++; next; }` [download] just a slight modification... ;) update: Corrected typo pointed out by roboticus. Thanks ;)	[reply] [d/l]