in reply to Peeling Data with Reserved Characters and Long Lines

You guys are fast.

I've been working through the first examples.

Starting from the suggestions, I wrote several scripts (I did variants with $& for the match and also one that uses substring), but... the "real world" files include lines that are 12,000 characters long, often with no word breaks. When I run the scripts on a test file with short lines, they work. When I try them on the dense-text file, it comes up blank. Am I hitting a 2048-character limit? or is it the lack of word breaks?

Maybe the more recent posts will help. I'll read them and check back.

  • Comment on Re: Peeling Data with Reserved Characters and Long Lines

Replies are listed 'Best First'.
Re^2: Peeling Data with Reserved Characters and Long Lines
by Eliya (Vicar) on Mar 12, 2011 at 20:18 UTC

    There definitely is no 2048-character limit, and lines with 12,000 characters aren't exactly long — with machines having several Gigs of RAM these days.  Also, a lack of word breaks shouldn't matter either, as your match pattern is independent of them.  So it must be something else... Are your real world files maybe UTF-16 encoded, or some such?