Re^2: Processing LARGE text files

Forgive me, but m/regex/gix was an oversimplification.

To expand upon the logic and to be more accurate, I use:

while($file =~ m/<DELIMITER>(.*?)<\/DELIMITER>/gs)
[download]

to capture the text areas I need to search, and I use:

if($searcharea =~ m/$regex/gm)
[download]

to see if the selected areas of text contain any keywords.

Comment on Re^2: Processing LARGE text files Select or Download Code

Replies are listed 'Best First'.
Re^3: Processing LARGE text files by thedoe (Monk) on Mar 07, 2006 at 21:19 UTC
I notice you put in your example: `<DELIMITER>(.*?)<\/DELIMITER>`. Is this because you are working with very large XML files? Or is this simply your way of seperation? The reason I ask is because I have recently dealt with very large XML files, and found XML::Twig to be very helpful. You can read in smaller chunks of XML data at a time. You can then process it with the same ease as a tree based parser, such as XML::Simple. Once you are done processing that chunk, simply either flush (which prints the chunk) or purge (does not print) the data, freeing the memory.	[reply] [d/l]
Re^4: Processing LARGE text files by Craig720 (Initiate) on Mar 08, 2006 at 14:55 UTC
The delimiters are words in angle braces such as <BOUNDARY> and </BOUNDARY>. Can the XML modules you mentioned be rigged to operate on very large text files containing non-standard XML? My experience with XML is minimal.	[reply]