kenclark has asked for the wisdom of the Perl Monks concerning the following question:

I am parsing a file and there is a multi-line section that may have 0 or more new line characters at its beginning and / or the end. I want to trim these new lines but leave any new lines intact that are in the body of the text.

For example, the relevant section of the source file might look like this:

... Some Field:some value Example Field: This is a sentence in a paragraph. This is a sentence in a paragraph. +This is a sentence in a paragraph. This is a sentence in a paragraph. This is a sentence in a paragraph. +This is a sentence in a paragraph. another Field:some value

When I parse "Example Field" the resulting string should be:

$example_string = "This is a sentence in a paragraph. This is a senten +ce in a paragraph. This is a sentence in a paragraph. This is a sentence in a paragraph. This is a sentence in a paragraph. +This is a sentence in a paragraph."

and not:

$example_string = " This is a sentence in a paragraph. This is a sentence in a paragraph. +This is a sentence in a paragraph. This is a sentence in a paragraph. This is a sentence in a paragraph. +This is a sentence in a paragraph. "

This is how I am trimming the beginning white space (and it works):

$body =~ s/^\n+//m;

However I tried this regular expression to trim the ending white space, and it does not work:

$body =~ s/\n+$//m;

I know my syntax is off, but not sure where.

I'd appreciate any help to get it right.

Thanks in advance.

Replies are listed 'Best First'.
Re: Trim blanks from the beginning and end of a multi-line string
by JavaFan (Canon) on Jan 29, 2012 at 13:59 UTC
    Get rid of the /m, and it should work. The /m changes the meaning of ^ and $ to match at internal newlines. Which is what you do not want.

      kenclark: Another approach to the problem is to say exactly what you want to do: trim all whitespace at the absolute beginning  \A or at the absolute end  \z of the string. This avoids any confusion with the various meanings of the  ^ $ metacharacters. With a PBP-style substitution regex:

      >perl -wMstrict -le "my $para = qq{\n\n \t \nsentence 1\nsentence 2\n\nsentence 3\n\n \t\n}; print qq{[[$para]]}; ;; $para =~ s{ \A \s+ | \s+ \z }{}xmsg; print qq{[[$para]]}; " [[ sentence 1 sentence 2 sentence 3 ]] [[sentence 1 sentence 2 sentence 3]]
        s{ \A \s+ | \s+ \z }{}xmsg;
        I don't think this deserves any price for clearity.

        You're using /m and /s while you aren't using any construct for which this is relevant. This just leads to more people like the OP who will think that using /m and /s is a good idea, without understanding their meaning. And then use it at the wrong time. Furthermore, I don't see the point of /g. It's just an artificial way of putting two constructs into one. The /x j u s t  m a k e s  i t  l i k e  y o u r  r e g e x p  s u f f e r s  f r o m  b a d  k e r n i n g .

        I'd write it as:

        $str =~ s/^\n+//; $str =~ s/\n+$//;
        if only because it's idiomatic.
        Terrific. Thank you.
      Perfect. That did the trick. Thank you.
Re: Trim blanks from the beginning and end of a multi-line string
by Khen1950fx (Canon) on Jan 29, 2012 at 15:59 UTC
    Another way to do it: Text::Trim.
    #!/usr/bin/perl -l use strict; use warnings; use Text::Trim; $|=1; my @strings= <<"EOF"; This is a sentence in a paragraph. This a sentence in a paragraph. This is a sentence in a paragraph. This is a sentence in a paragraph. This a sentence in a paragraph. This is a sentence in a paragraph. EOF my $trimmed = trim(@strings); print "\"$trimmed\"";
      And another great solution. Thanks everyone for the help.
Re: Trim blanks from the beginning and end of a multi-line string
by chessgui (Scribe) on Jan 29, 2012 at 14:04 UTC
    Are you sure that there are no other invisible characters at the end of the file (such as carrige return "\r")? If there are you should include them in the regexp:
    $body=~s/[\n\r]+$//;
      Normally that is not needed because this \n is "magical" - it handles <LF> Unix style "new line" and Windows style <CR><LF> "new line". If you write a "\n", it will write that platform specific type of "new line". When you read that file on the other platform, the other platform's "newline" is ok.

      Update: For the folks who may not be up on the terminology... CR, Carriage Return is what \r is. LF, Line Feed is the character that Unix will write for "\n". Windows will write both for a "new line". As trivia, the convention for network transmission of lines of text (like over a socket) is the same as Windows, <CR><LF> that's true even on Unix system. Perl handles all this oddness in a very nice and magical way - basically the "right thing" happens (Do What I Meant).