you already got some good advice, so I just want to clarify few things.
> should grab the 25 characters before and after it
it's not what the regex you posted is supposed to do: it grabs from 0 to 25 chars before and after the string. As already said gsix modifiers must go outside the regular expression: ' m/.../gsix'
Let's use your regex to match 0-3 chars before and after the letter X using: /.{0,3}X.{0,3}/ against some strings:
# regex /.{0,3}X.{0,3}/ # # string matched part 123X123 123X123 12X123 12X123 1X123 1X123 X123 X123 X123456 X123
And now confront the different output of the /.{3}X.{3}/ regex against the same set of strings:
# regex /.{3}X.{3}/ # # string matched part 123X123 123X123 12X123 -no match- 1X123 -no match- X123 -no match- X123456 -no match-
Infact the second version search for at least 3 chars before and after X
Now a little note about slurping files. When you do it the file goes deirectly into the memory, with probably even some overhead, so 100Mb of file data will be at least 100Mb+ of RAM used. As you will work as bioinformatic with possibly big files it's better to understand this early.
If you process the file one line at time the memory consumption is minimal. The diamond operator <> is a poweful beast in Perl and, as many other things in perl, it acts differently depending on the context it was used in.
# open my $fh, '<', $file_path or die "unable to read $file_path" # list context: every line goes in the array my @all_lines = <$fh>; # scalar context: just next line goes into a scalar (<> acts as an ite +rator here) my $line = <$fh>; # so to read a file one line at time: while (defined( my $line= <$fh>)) {
See How to read in large files
L*
In reply to Re: question about finding strings (regexes and slurping files)
by Discipulus
in thread question about finding strings?
by SaraBetsy
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |