in reply to regex to replace linefeeds with <p> tags

Your [\r\n] is suspicious. Does the data have \r or \n or \r\n? If the latter, you don't want a character class, since that matches even a single "\r\n" when presumably you mean it to match only "\r\n\r\n...".

You might try stripping out all \r's before doing the regex and using \n as the only line ending, with your regex looking for just \n{2,}

Replies are listed 'Best First'.
Re^2: regex to replace linefeeds with <p> tags
by Joost (Canon) on Dec 25, 2006 at 22:16 UTC
      I'll second Joost's approach, but make a few suggestions for readability / maintainability.
      • Use qw so to de-clutter the list of strings.
      • Use an explicit variable in the for; they are cheap and they make your intention clear.
      • Use [ ] braces for the the regex separator so you won't have to backslash the slash. This de-emphasizes some of the executable line noise effect
      • Use the regex x modifier to put some whitespace and comments in here.
      foreach my field (qw(postby title teaser content)){ $in{$field} =~ s[ (\r? \n){2,} ] # two or more CR [ \n </p> \n <p> \n]gx; # Close one para, open ano +ther }
      throop
        Use [ ] braces for the the regex separator so you won't have to backslash the slash. This de-emphasizes some of the executable line noise effect
        I've long been of two minds about this. Consistency makes for readability, and I don't think I could sell everyone I share code with on always using some other delimiter, so I see a lot of advantage to always using / and escaping as needed (and the escaping does the job of pointing out that the following / is indeed literal, which is just what I want done.) On the other hand, that \ seems to bother a lot of other people, and they can't all be wrong, can they?
      I didn't suggest that because then you are (assuming the data was consistent in the first place) leaving most lineends as "\r\n" but those at paragraph breaks as "\n", and that bothered me.
Re^2: regex to replace linefeeds with <p> tags
by jck (Scribe) on Dec 26, 2006 at 02:47 UTC
    actually, i DO want to match "\r\n" (or "\n\r" for that matter).

    the bottom line, is that i want to identify two+ linebreaks as a paragraph break, whether the linebreaks are \r or \n (or a mixture of both).

      But the behaviour you describe indicates that the user input is coming back as the sequence \r\n for a single line break.
        sorry - i know it's sloppy.

        i originally used the \r\n and it worked for awhile, but then started generating all the additional tags, so i was trying to make it more generalized.

        i think that the input usually has a CR from the word processor (which would be a \r), and then the user adds a second line feed in the form when they are inputting it - since i guess they see only one linefeed, and i have instructions to the user to separate paras by two lines....leading to the \r\n that i expected before, but maybe now they're doing something different?