in reply to Re: Processing LARGE text files
in thread Processing LARGE text files

Forgive me, but m/regex/gix was an oversimplification.

To expand upon the logic and to be more accurate, I use:

while($file =~ m/<DELIMITER>(.*?)<\/DELIMITER>/gs)

to capture the text areas I need to search, and I use:

if($searcharea =~ m/$regex/gm)

to see if the selected areas of text contain any keywords.

Replies are listed 'Best First'.
Re^3: Processing LARGE text files
by thedoe (Monk) on Mar 07, 2006 at 21:19 UTC

    I notice you put in your example: <DELIMITER>(.*?)<\/DELIMITER>. Is this because you are working with very large XML files? Or is this simply your way of seperation?

    The reason I ask is because I have recently dealt with very large XML files, and found XML::Twig to be very helpful. You can read in smaller chunks of XML data at a time. You can then process it with the same ease as a tree based parser, such as XML::Simple. Once you are done processing that chunk, simply either flush (which prints the chunk) or purge (does not print) the data, freeing the memory.

      The delimiters are words in angle braces such as <BOUNDARY> and </BOUNDARY>.

      Can the XML modules you mentioned be rigged to operate on very large text files containing non-standard XML? My experience with XML is minimal.