thpfft has asked for the wisdom of the Perl Monks concerning the following question:

this is an embarrassingly basic question, but here goes. There are a lot of messages in here related to the subject of line endings, but its very simplicity (and ubiquity) mean that it's hard to find an authoritative answer, and I'm hoping that the combined wisdom and bickering of the monastery can provide me with a standard best way.

in short, I want to match on line endings regardless of the combination of newlines, carriage returns and what have you in the text entered or uploaded. For example, the most common case is that I want to insert an html paragraph break at each occurrence of two or more returns.

the code I'm using at the moment is this (extracted without the bits that prettify output):

sub paraphrase { my $text = shift; $text =~ s/\r/\n/gs; $text =~ s/(?:\n\s*){2,}/<\/p>\n<p>/gs; return "<p>$text</p>"; }

Which seems to work, but I put it in a very long time ago and now it feels clunky and brittle. Can anyone set my mind at rest?

Replies are listed 'Best First'.
Re: line endings
by blm (Hermit) on Sep 17, 2002 at 13:26 UTC

    If I was in your shoes I would read perlport and then experiment. I don't have your data to test my theories on sorry!

    I am assuming that the end of a line is created with the enter key and doesn't rely on word wrap of the editor

    Here goes....

    From your code:

    $text =~ s/\r/\n/gs;

    If there is a \r replace it with a \n so if you have a \r\n you now have \n\n which will trigger a html para break in the next line. If you have a DOS text file on Unix you will get a para break inserted every line time there is a end of line. Also do you need the /g and /s modifiers? list context and single-line mode? I didn't need that but I am not a regexp pro. CHeck with others but why not:

    $text =~ s/\r\n/\n/;

    When reading a DOS text file on UNIX it will replace \r\n with a single \n which is what I beleive you want. UNIX files shouldn't be touched. This is what I did in my project.

    The second one is supposed to just replacing two \n's with a </p>\n<p>? I am not a regexp pro so check with the others. I would use

    $text =~ s/(\n\s*){2}/<\/p>\n<p>/;

    I can't comment about reading both unix and dos text files under windows as I haven't done this.

Re: line endings
by Helter (Chaplain) on Sep 17, 2002 at 13:33 UTC