in reply to Performing a grep-like action multiple times on a single line.
Assuming you actually want to store all the matches, here's a possible approach using index and substr (assuming your "$graph" and "$num" represent the target character(s) and the context size):
(It takes a little practice to get around the "off-by-one" types of errors with this kind of approach, but once you solve that, it's fine. In this approach, if the context size is, say, 4 characters before and after, but the target shows up as the 2nd or last character in the string, the target will still be captured, and will include the shorter context.)while (<>) { chomp; my $offset = 0; my $limit = length(); while (( my $found = index( $_, $graph, $offset )) >= 0 ) { my $bgn = ( $found - $num > 0 ) ? $found - $num : 0; my $end = ( $found + $num +1 < $limit ) ? $found + $num +1 : $ +limit; push @{$graph_contexts{substr( $_, $bgn, $end - $bgn )}}, $_; $offset = $found + 1; } }
A more "brute force" (effective but perhaps less efficient) approach would be to simply go through all the substrings of $num*2+1 characters, and keep the ones that have $graph in the center position:
(This one will only do the matches that have full context ($num characters) before and after the character being matched.)$sublen = $num * 2 + 1; while (<INPUT>) { chomp; for my $ofs ( 0 .. length()-$sublen ) { my $ngram = substr( $_, $ofs, $sublen ); next unless $ngram =~ /^.{$num}$graph/; # store this ngram to your hash } }
|
|---|