andye has asked for the wisdom of the Perl Monks concerning the following question:

Hi folks,

Quite a simple one, but it's puzzling me.

I'm trying to match each line of a string, including empty lines. I've found that foreach (/(.*)/gm) does the job - because if it sees two newlines in a row, then dot star matches (0 copies of any-character-but-newline) and it returns an empty string.

And that's fine. But then I thought to myself, hang on, I should be using split here, so I changed it to foreach (split /\n/) ...and it doesn't match empty lines.

I've looked in the Cameliad, but either it doesn't explain or (more likely) I'm not understanding correctly.

Could someone explain?

Cheers,

andy.

Replies are listed 'Best First'.
Re: Split and empty strings
by Rhandom (Curate) on Apr 10, 2001 at 19:27 UTC
    Never underestimate the importance of trying a small test script. Notice what the following code reveals.
    #!/usr/bin/perl -w $_ = "The quick\n\nbrown fox\njumped."; print "-------------\n"; foreach (/(.*)/gm){ print "[$_]\n"; } print "-------------\n"; foreach (split/\n/){ print "[$_]\n"; } print "-------------\n";
    This code prints out the following:
    ------------- [The quick] [] [] [brown fox] [] [jumped.] [] ------------- [The quick] [] [brown fox] [jumped.] -------------
    So, while it may look the same, it really isn't.
      Thanks Rhandom and davorg,

      Looks like I had my problem the wrong way round - split is behaving as I originally thought it ought to - i.e. it *is* matching blank lines perfectly normally - but the regexp is producing more matches, as in the Quick Brown Fox above.

      But I can't see (feeling particularly thick today) why the regexp produces those extra matches, even in the simple example above. Could someone elucidate?

      Cheers,

      andy.

        It's probably a Death to Dot Star! issue. You should read that node and see if you can recast your regex to better define the data that you're trying to match. At the moment, it's matching "string\n\n" as three separate strings when you probably only want two.

        --
        <http://www.dave.org.uk>

        "Perl makes the fun jobs fun
        and the boring jobs bearable" - me

        This is a little weird. Can somebody explain this.
        #!/usr/bin/perl -w $_ = "The quick\n\nbrown fox\njumped."; print "-------------\n"; foreach (/(.*)$/gm){ print "[$_]\n"; } print "-------------\n"; foreach (/^(.*)/gm){ print "[$_]\n"; } print "-------------\n"; foreach (/^(.*)$/gm){ print "[$_]\n"; } print "-------------\n";
        Produces the following
        ------------- [The quick] [] [] [brown fox] [] [jumped.] [] ------------- [The quick] [] [brown fox] [jumped.] ------------- [The quick] [] [brown fox] [jumped.] -------------
        Even though I found a solution to the problem, I still don't have an explanation for it. Does anybody know what is going on in the engine?

        I've added use re qw(debug); to the script and looked and in the first example it says it matched 0 of 32767 times when the position is still on character 9. On the second set, this doesn't happen. How is this matching twice on the first time. Also I tried this
        #!/usr/bin/perl -w $_ = "The quick\n\nbrown fox\njumped."; print "-------------\n"; foreach (/(?:^|\G)(.*)/gm){ print "[$_]\n"; } print "-------------\n";
        And look what it did
        ------------- [The quick] [] [] [brown fox] [] [jumped.] [] -------------
        Hmmmmmmm.
Re: Split and empty strings
by davorg (Chancellor) on Apr 10, 2001 at 19:27 UTC

    Can you post some code. I don't think I understand what you're saying. I tried the following:

    #!/usr/bin/perl -w use strict; $_ = 'This is a string with newlines and blank lines'; my @arr = split /\n/; print map { "|$_|\n" } @arr;

    a nd it gave this output:

    |This is| |a string| || |with| |newlines| || |and| || |blank lines|

    which looks right to me. What were you hoping for?

    --
    <http://www.dave.org.uk>

    "Perl makes the fun jobs fun
    and the boring jobs bearable" - me

Re: Split and empty strings
by TheoPetersen (Priest) on Apr 10, 2001 at 19:22 UTC
    Are the empty lines at the end, perhaps? By default, split trims the returned list at the first empty value. See perlfunc:split for the exact terms.

    Update: I just tried this code, and it works as I expect.

    $_ = "one two three four "; foreach my $line (split(/\n/)) { print "$line\n"; }
    The output for my Perl 5.6.1 system is:

    $ perl ~/split.pl
    one
    two
    three
    
    four
    
    with the blank printed between three and four, and nothing thereafter.
      No, they're scattered randomly through the string. But thanks for the idea, TheoPetersen. I've looked at perlfunc:split again - still no joy.

      Slight modification - I've realised I don't need the /m on the regexp. Works the same without.

      andy.

      update: thanks again. I've realised that I was being stupid - the onset of tunnel vision. split was doing exaclty what it ought, as your example shows, but the regexp was returning extra blank lines.