Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a file in which I'm trying to extract a number out of a pattern that can have a newline at different points within it. The following represent and few examples of this:

dist: 45 km;

dist: 45
km;

dist:
45 km;

For some reason, I haven't been able to get the /s at the end of the pattern matching string to work. I haven't been doing perl long and I've been pouring through lots of books, but nothing seems to work. I could definitely use some help. Thanks!

Replies are listed 'Best First'.
Re: newline in pattern matching
by Wonko the sane (Curate) on Nov 15, 2002 at 17:40 UTC
    This pattern succesfully matches all of the sample lines.
    my $dist = $1 if ($str =~ /dist:.([0-9]+)/s );

    Your regex IS getting the numerical part of string, but it is capturing the newlines with it as well. Maybe this fact is what is causing you problems?
    Wonko
      Hi Everyone. I've tried your suggestions (and combos of) and the one that seems to work best is

      (/dist:.(\d+)/s)

      The only case I haven't gotten this one to work on is:

      dist:
      45.3 km;

      Any suggestions?

      I should have mentioned earlier that the number involved is a real number and that I'm reading these numbers from an input data file. Right now, only the integer part of the number is being read. I'm in the process of looking up how to handle that one right now. Thanks again!
        You're really close. Try this:
        ( /dist:.*([\d.]+)/s )
        Note that the period, when inside square brackets, is treated as a literal, not as a wildcard.
Re: newline in pattern matching
by zigdon (Deacon) on Nov 15, 2002 at 17:25 UTC
    What regex are you trying? Show us what you have so far, and we could probably help you find the problem.

    -- Dan

Re: newline in pattern matching
by Enlil (Parson) on Nov 15, 2002 at 17:30 UTC
    I think if you posted a little bit of code it would be easier to see what exactly you might be going on. As the /s not working could be attributed to a couple of things. To specify one: Is the variable you are matching the pattern against contain the whole of what you are trying to match, or are you doing a line by line comparison?

    The /s modifier mainly allows the . char in the regex to also capture newlines whereas it otherwise would not.

    -enlil

      The latest version of coding I've been trying to use is:

      (/dist:(.+)$/ms)

      This works for all but the case where "dist:" and the number are on different lines.
        This code seems to do what you're looking for:
        #!/usr/bin/perl use strict; use warnings; my @data = ( "dist: 45 km;", "dist: 45\nkm;", "dist:\n45 km;" ); foreach my $chunk( @data ){ my( $distance ) = $chunk =~ /dist:.*?(\d+).*?km;/s; print STDOUT "$distance\n"; }

        Here's a breakdown of the regexp:
        /dist: # match 'dist:' .*? # match anything (including newlines), but only until th +e next part of the pattern starts to match (\d+) # match 1 or more digits, and put these into $1 .*? # same as before km; # match 'km;' (not needed unless you want to match a num +ber of these in one string) /s # treat the whole line as a single string


        Wonko's code works pretty much the same way. I think part of your problem may be that you appended "/ms" to your pattern. 'm' makes the regexp treat the string as multiple lines, and 's' makes the regexp treat the string as a single line, which is what you want it to do in this case. I don't know how those opposing options interact, but that may have caused some of your difficulty.

        One other option is to run $chunk =~ s/\n/ /; over each chunk. That will take all of the newlines out of each, so that all of the strings you gave as examples will evaluate to the same string. That would make a regexp much easier to write.
        --
        Love justice; desire mercy.
        I believe your problem is the /m flag - you're telling perl to match '$' to end of lines in the middle of a string. Try removing it.

        -- Dan