legin has asked for the wisdom of the Perl Monks concerning the following question:

Dear Most Wise Monks,

I'm trying to write a regex that extracts three lines from the following:

my $string = <<STRING; one. two \n \n . three. STRING
I'd like a regex that returns each logical line and ignores the embedded newlines. Something like:
my @lines = $string =~ m/^(.*?)$/msg; # I need this test to pass is(scalar(@lines), 3, 'found 3 lines only');
How can I write a regex that ignores the two embedded newline characters (see line two)?
I need to match on line endings while skipping over the embedded newlines?

Replies are listed 'Best First'.
Re: extracting lines from a string - while ignoring the \n character
by svenXY (Deacon) on Oct 29, 2007 at 09:33 UTC
    Hi,
    would this help you?
    ## single quotes around heredoc my $string = <<'STRING'; one. two \n \n . three. STRING my @lines = $string =~ m/^(.*?)$/msg; print scalar @lines, " lines\n";
    results in: 3 lines
    Regards,
    svenXY
Re: extracting lines from a string - while ignoring the \n character
by moritz (Cardinal) on Oct 29, 2007 at 09:50 UTC
    If by "logical line" you mean "anything up to a period .", you can use this regexp:

    m/[^.]*\./s

    If there are periods allowed within the line, you could try to match a . followed by a newline:

    m/.*?\.(?=\n)/s

    If you mean to do something completly different, you should be more explicit about your goals

    Update: and to get all matches, use the /g modifier.

Re: extracting lines from a string - while ignoring the \n character
by erroneousBollock (Curate) on Oct 29, 2007 at 09:31 UTC
    I may be missing something, but I believe that the literal newlines and the embedded newlines are equivalent in $string, so there's no way to differentiate between them.

    I tried to muck with $/ for a while before realising that such an approach would only work if the string was being read from somewhere rather than being located in the source code.

    Will the string be defined inline in your real usage case?

    -David

      Your approach works. IO::String comes to your rescue ;-)
      $ cat test.txt bla. fasel . foo . bar. $ perl -e ' use IO::String; $str=`cat test.txt`; $fh = IO::String->new($str); $/="."; print "$_|" for <$fh>' bla.| fasel .| foo .| bar.| |

      print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});
        That's fine if you're sure that '.' == EOL is what the OP meant, but that's not clear to me.

        Maybe I misunderstood, but it seemed to me like the OP wants to distinguish between what s/he perceives to be two kinds of newline.

        -David

Re: extracting lines from a string - while ignoring the \n character
by Anonymous Monk on Oct 30, 2007 at 01:15 UTC
    I'm not sure I understand what you're trying to do. Shouldn't it be this simple?

    my @lines = split /\n/, $string;