OK, I've been away from Perl programming for some time - and I was never that experienced with it anyway. Well, I came across a situation that seems to be begging for some Perl code to solve it, and I can't remember how to do what I was thinking. Maybe some kind monk here can point me in the right direction.
The problem is I have a ton of text files (ebooks, but in plain text) that need cleaned up. The biggest problem with the files is that the text is not wrapped, it contains new line characters at the end of every 60 or 70 characters or so. This makes the "paragraphs" in the text actually a collection of single lines. Also the paragraphs are usually separated by a couple new line characters in a row.
I would like to do a search and replace on the file as a whole, removing all single new line characters.
I recall that there is a way to tell Perl to not look at the new line character as the end of a record, so that the whole file could be globbed into one string for processing, then written back out easily. But I can't remember where I read that, or what it was called.
Does anyone know what I'm talking about?
I'm also game for hearing suggestions on how best to approach the logic of this code. (I'm not asking people to write it - just for tips on good tactics to use to approach it.)
Thanks in advance,
Tom
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.