SavannahLion has asked for the wisdom of the Perl Monks concerning the following question:

I'm merely writing a report module in Perl and I have the following code bit
$val =~ s/^(.*)\n/$1/;
My Regex is a bit weak (hey, I'm kind of new at this. :p ) but this looks it should work. I'm anchoring the search to the beginning of a string, pick up all the characters until I hit the first \n, then use what was found before the new line to replace the existing value in $val.
Unfortunately, It doesn't work... at all. It gobbles up everything. i.e it doesn't stop at the first \n.
I thought maybe I misunderstood just what . meant so I checked my llama book and it says that, "... it matches any single character except a newline (which is represented by \n)."

So I dug around and found that the following code snippet will work.

$val =~ s/\n.*//s;

I've done searches around the monestary, but I'm at a complete loss at what I'm supposed to be looking for. :( I'm pretty sure I saw a very similar question regarding this sort of thing, but I can't remember where I saw it, much less the answer.

Is it fair to stick a link to my site here?

Thanks for you patience.

Replies are listed 'Best First'.
Re: Junking excess string.. junk with s///.
by sauoq (Abbot) on Oct 16, 2003 at 05:53 UTC

    You have to keep in mind that you are replacing a portion of the string, not the whole thing. Only the portion you match is replaced.

    Your code just removes the first newline. You are matching anything from the beginning up to and including the first newline and replacing with everything you match excepting that newline. Consider "foo\nbar\n". You replace "foo\n" with "foo" leaving "foobar\n".

    It seems that what you really want is something like this:

    $val =~ /(.*)/ and $val = $1;
    or
    $val = ( $val =~ /(.*)/ )[0];

    Edit: Changed wording for clarity.

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: Junking excess string.. junk with s///.
by Enlil (Parson) on Oct 16, 2003 at 05:59 UTC
    What the first snippet of code does (i.e. $val =~ s/^(.*)\n/$1/; ) is capture from the beginning everything up to the first "\n" (as you mentioned . does not match \n without the /s modifier on the regex), and puts it in $1. To complete the match it also matches the "\n". The thing that you are missing is that it does not do anything to the rest of the string, but substitute the part that was matched with whatever is in the replacement part (i.e. $var =~ s/match/replace/; will replace the first instance of match in $var and replace it with replace.)

    To put it another way, if you had the string $val = "dogs and cats\nfoo and bar\ntime and taxes" then it would capture "dogs and cats" in $1 and then also capture the \n after it for the match. After which it would replace just this string with "dogs and cats" resulting in a string:"dogs and catsfoo and bar\ntime and taxes" The second snippet matches because you are capturing everything after the first "\n" (inclusive) and then replacing it with nothing (effectively getting rid of the first \n and everything after it, which is what I think you are after).

    -enlil

Re: Junking excess string.. junk with s///.
by pg (Canon) on Oct 16, 2003 at 06:06 UTC
    This is made similar to your regexp, so you can easily compare:
    $var =~ s/^(.*?)\n(?:.*)/$1/s;
    tips:
    • that modifier s at the end made . matches everything including \n;
    • as now . matches everything, we need to make it non-greedy;
    • make the entire string matched, so that the entire string will be substituted (in your original code, only the part before first \n is matched, and then that part is substituted with itself, so nothing is done);
    • For the part after the first \n, we match it, but not capturing it, so used ?:

      I didn't think you were serious, but you included those "tips"...

      • For the part after the first \n, we match it, but not capturing it, so used ?:

      So why group it at all? $var =~ s/^(.*?)\n.*/$1/s;

      • make the entire string matched, so that the entire string will be substituted (in your original code, only the part before first \n is matched, and then that part is substituted with itself, so nothing is done);

      Since we aren't doing anything with the first part of the string, it seems pointless to capture it at all. This:

      $var =~ s/\n.*//s;
      does the trick just fine. But... the OP already came upon that solution and said so in his node. My tip: either do it that way or match what you want and reassign. No point in obfuscating unnecessarily.

      -sauoq
      "My two cents aren't worth a dime.";
      
        Cool great. I really appreciated the lessons from both sides. The code by pg requires a bit more reading on my part, but otherwise it's been very informative.

        I honestly don't think I would've thought that that was how the Regex behaved. I'm still new with the idea of Regex since doing anything similar with any other language would involve writing about a hundred lines of code to accomplish the same thing.

        Is it fair to stick a link to my site here?

        Thanks for you patience.

Re: Junking excess string.. junk with s///.
by Yendor (Pilgrim) on Oct 16, 2003 at 06:02 UTC

    First, a link. Read about how regexp's are greedy.

    Keeping that in mind, what your first example does is match as much as it can until it finds the last \n. Since you have anchored it to the beginning of your string, it starts at the beginning if your strings, and puts everything until the last \n into $1.

    AIUI, the second piece of code does the following:

    1. Find the first \n in your string.
    2. Grab everything after it.
    3. Junk it
    4. Assign the altered string (which is everything before the first \n) back to $val.

    Update: OK, so I don't quite understand regexp's like I thought I did (and I wasn't even willing to give myself that much credit!) I sit corrected; thanks sauoq. Thank you, drive through.

      Keeping that in mind, what your first example does is match as much as it can until it finds the last \n.

      That is incorrect. The dot (".") does not match a newline.¹ Therefore, /^(.*)/ will capture everything up to the first newline, not the last one.

      1. Unless the regex is modified with /s. The OP's was not.

      -sauoq
      "My two cents aren't worth a dime.";
      
      So . matches \n as well?

      What's really strange is that, so far, everyone seems to be making sense here. :(

      If you say that Regex's are greedy by stopping at the last \n and someone above you says that it stops at the first \n (as in foo\nbar\n becomes foobar\n) how do I know which way the Regex is behaving?

      Is it fair to stick a link to my site here?

      Thanks for you patience.

        Listen to everyone else. I did some further reading and was corrected by others, and have found myself to be wrong. Sorry for the confusion.