comment on

Hello SaraBetsy and welcome to the monastery and to the wonderful world of Perl!

you already got some good advice, so I just want to clarify few things.

> should grab the 25 characters before and after it

it's not what the regex you posted is supposed to do: it grabs from 0 to 25 chars before and after the string. As already said gsix modifiers must go outside the regular expression: ' m/.../gsix'

Let's use your regex to match 0-3 chars before and after the letter X using: /.{0,3}X.{0,3}/ against some strings:

# regex /.{0,3}X.{0,3}/
#
# string       matched part

123X123         123X123
12X123          12X123
1X123           1X123
X123            X123
X123456         X123
[download]

And now confront the different output of the /.{3}X.{3}/ regex against the same set of strings:

# regex /.{3}X.{3}/
#
# string       matched part

123X123         123X123
12X123          -no match-
1X123           -no match-
X123            -no match-
X123456         -no match-
[download]

Infact the second version search for at least 3 chars before and after X

Now a little note about slurping files. When you do it the file goes deirectly into the memory, with probably even some overhead, so 100Mb of file data will be at least 100Mb+ of RAM used. As you will work as bioinformatic with possibly big files it's better to understand this early.

If you process the file one line at time the memory consumption is minimal. The diamond operator <> is a poweful beast in Perl and, as many other things in perl, it acts differently depending on the context it was used in.

# open my $fh, '<', $file_path or die "unable to read $file_path"

# list context: every line goes in the array
my @all_lines = <$fh>;


# scalar context: just next line goes into a scalar (<> acts as an ite
+rator here)
my $line = <$fh>;

# so to read a file one line at time:
while (defined( my $line= <$fh>)) {
[download]

See How to read in large files

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

In reply to Re: question about finding strings (regexes and slurping files) by Discipulus
in thread question about finding strings? by SaraBetsy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.