comment on

The first thing to do is drop all the /g options on your regexes. You only need to know if it exists, not if it exists more than once, unless you are going to bother to do something with the latter information which you aren't currently. That could save some processing.

You could save some more time by not performing the checks for "Legal" or "Tabloid" if you already found "Letter". The same for the other catagories, That ought to cut the processing time by around half (guess!!)

If you order the various types by the most frequent usage, it might save a bit more.

Finally, if you have 60+ MB of ram to spare, you might save some time by slurping the file into a scalar and then running your regexes against that. If you do this, make sure that you don't use the /g option or apply more regexes than you need to. (ie. No Duplex of you already found Simplex etc.)

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

In reply to Re: Fast file parsing by BrowserUk
in thread Fast file parsing by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.