in reply to Can't remember the term to search for help on!

See perlrun, there "paragraph mode". The boundary between paragraphs in that mode may comprise several blank lines.

Given a file named "text" as follows

this is the first paragraph. It has three lines. Of which this one is the third. A blank line denotes a paragraph. This one has got two lines only. The third paragraph is preceded with several blank lines, and it has itself three, no, wait, four lines. Last paragraph. All paragraphs should be shown each as one line running "perl -p00 -le 's/\n/ /gs;s/\s+/ /g;'" on that file, with multiple blanks condensed into one.

the snippet

perl -p00 -le 's/\n/ /gs;s/\s+/ /g;' text

does what you want. If you want inplace-edit (see perlrun again), say

perl -p00 -i.bak -le 's/\n/ /gs;s/\s+/ /g;' text

to have the file backed up with the suffix .bak as text.bak

You can provide multiple files on the command line; each will be processed in turn (and backed up, if requested).

Replies are listed 'Best First'.
Re^2: Can't remember the term to search for help on! (paragraph mode)
by tom2112 (Novice) on Dec 07, 2009 at 20:53 UTC
    Thank you to all you kind Perl Monks!! All of the above are great solutions. The INPUT_RECORD_SEPARATOR was what was trying to recall, but you've all offered good solutions. I've only had a chance to try the last solution, and it works like a charm. I didn't realize I could do search and replaces direct from the command line like that. That's awesome! Thanks again!

      For anyone else that needs to cleanup poorly formatted ebooks in text files, here's what I came up with from the help I received above:


      perl -p00 -i.bak -le "s/-\n//gs;s/([^!\?\.\"\'\`])\n/\1 /gs;" myfile.txt

      This will remove newline characters at the end of lines that do NOT end in a period, question mark, exclamation point or some form of quote. However, prior to removing those newline characters, it removes any newline character preceded by a hyphen as well as removing the hyphen.

      It works great. Thanks Perl Monks!