Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I've spent some time working with files that are written from the contents of HTML TEXTAREA boxes to files on the (UNIX) webserver.

More or less by trial an error, I've discovered that when I read them back in, by doing this:

open(FILEHANDLE,'filename.txt');<BR> @theFile = <FILEHANDLE>; $thefile = join('',@theFile);

...the file's lines are separated by \r\n, or more precisely, the only way I can transform it into HTML by replacing linebreaks with &lt;BR&gt; tags or P tags for double line breaks is by replacing those two characters.

Why is this? Why isn't it just \n for instance?
--

($_='jjjuuusssttt annootthhrer pppeeerrrlll haaaccckkeer')=~y/a-z//s;print;

Replies are listed 'Best First'.
Re: The Mysteries Of LineBreaks
by Tardis (Pilgrim) on Mar 12, 2002 at 11:06 UTC
    Something worth mentioning on this issue is the sort of TEXTAREA used in your forms.

    In many cases, where the TEXTAREA is used for some sort of textual description entry by the user, the HTML attribute 'WRAP' should be set to 'VIRTUAL'.

    When this is done, entered lines will word wrap at the TEXTAREA boundary, without the user pressing return.

    With this, provided your users are well trained, you can be sure that EOL markers represent the end of paragraphs, in other words each paragraph is a single line of text.

    This is very useful for clean formatting later, if it inserted back into a HTML document, or used in some other package.

    Note that this attribute is browser dependant, and is not in any of the HTML specifications. http://www.utexas.edu/learn/forms/boxes.html provides more information on this feature.

      Thank you all for your help.

      For the record, this behaviour happened when using Internet Explorer for Mac, but on Windows IE too.

      What I should really do is test out a few browsers and see what happens. Maybe if I get the time.

      The regex with the /r being optional will be very useful, plus the idea about locally setting $\ and $/, and I'd completely forgotten about the WRAP options with TEXTAREA boxes, I'll also investigate that.

      Thanks again.
      --

      ($_='jjjuuusssttt annootthhrer pppeeerrrlll haaaccckkeer')=~y/a-z//s;print;
Re: The Mysteries Of LineBreaks
by broquaint (Abbot) on Mar 12, 2002 at 10:24 UTC
    You are probably getting form submissions from a windows based machine as its line-ending is \r\n. If you want simply to "HTMLify" your text you could change the output record separator (see. $\).
    use strict; { local $/ = "\r\n"; open(FOO, "somefile.txt") or die("open died - $!"); chomp( my @data = <FOO> ); local $\ = "<br>"; print for @data; }
    Although you may want to put a \n after the <br> for a nicer source output.
    HTH

    broquaint

Re: The Mysteries Of LineBreaks
by Dog and Pony (Priest) on Mar 12, 2002 at 09:55 UTC
    First off, what browser are you using to submit the form? What the browser sends, your script will receive, so it is probably that. Different browsers on different platforms sends different line breaks.

    If you are having trouble with replacing two characters, you could for instance strip out all \r characters, either before replacing, or even before you are saving that input to file.

    But probably the simplest approach is to do something like this (untested and ugly code):

    s/(\r?\n){2}/<p>/g; s/\r?\n/<br>/g;
    That is, replace occurances of an optional \r and a non-optional \n with first <p> for double matches, and then <br> for single ones left after the first pass.

    Hope that helps.


    You have moved into a dark place.
    It is pitch black. You are likely to be eaten by a grue.