comment on

karavay I need to extract large chunks of text from files larger than 20mb each ...
What would you suggest using - regex or external module (which one) ...

20MB is small nowadays. Last week I extracted stuff (w/Perl) from a
~100MB File which took less than 2 seconds (old 3.4GHz Athlon) via Regex.

What are your limits, is this 'on the fly', is speed of importance?

If so, I'd write a small (few lines) Inline::C Wrapper to the C strstr()
function and look up "AZII"... (if your library implementation does
DWORD or QWORD aligned accesses and reads machine words at a time.)

But that (how to do that) depends *strongly* on the contex. Whats to
do with the found text then? Extract? Find only and tell?

Can you provide one small, minimal but exact sample of the text in question?

Regards
mwa

In reply to Re: Text Extraction by mwah
in thread Text Extraction by karavay

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.