kleinbiker7 has asked for the wisdom of the Perl Monks concerning the following question:

I need some help constructing a regular expression. I want to match up to 4 lines of up to 40 characters of text each. This is what I've got, but it's not working:

/(.{0,40}\n){0,4}/

The text is contained in a string, for example I would like to do this:

$foo = "1234567890123456789012345678901234567890 1234567890123456789012345678901234567890 1234567890123456789012345678901234567890 1234567890123456789012345678901234567890"; if ($foo =~ m/4 lines of 40 chars/) { do something}
Can I accomplish this without having to split on the string and adding a loop to process each entry? This would save on processing time. Thanks!

Robert

Replies are listed 'Best First'.
Re: How do I match lines of 40 characters long in a block of text?
by sauoq (Abbot) on Sep 25, 2002 at 20:06 UTC

    This really isn't as hard as everyone has made it out to be. You almost had it, but you need to anchor so as not to match partial lines. You also probably want to avoid matching the empty string. I'm guessing that you really want "1 to 4 lines each consisting of up to 40 characters followed by a newline."

    / ( # Assuming you want capture these lines. (?: # Group each line. ^ # Beginning of the line. .{0,40}\n # 0 to 40 characters followed by a newline. ){1,4} # 1 to 4 lines. (0 will permit an empty match.) ) # Done capturing. /mx; # /m so that ^ anchor works, /x for comments.
    -sauoq
    "My two cents aren't worth a dime.";
    

      I think you are probably right about what he actually needs, re: 1 to 4 rather than 0 to 4, but there is a possibility that yours won't cater for: A string containing < 40 chars but no newline..

      It's probably a spurious requirement, but trying to achieve it hung me up for ages.

      (Knowing you, you'll add a 4 character, positively backward, forward-looking, zero-width assertion to your regex and acheive that too:)


      Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
        but there is a possibility that yours won't cater for: A string containing < 40 chars but no newline..

        You're right. Mine doesn't account for it. I guess I assumed they all would end with newlines. That was a bad assumption on my part. Of course, I might blame it on poorly stated requirements. :-)

        (Knowing you, you'll add a 4 character, positively backward, forward-looking, zero-width assertion to your regex and acheive that too:)

        Nah... it should be easier than that. Use a $ to match the end of the line (not including the newline) and then \n? to match an optional newline. So, I tried that:

        / ( # Assuming you want capture these lines. (?: # Group each line. ^ # Beginning of the line. .{0,40}$\n? # 0 to 40 chars, an end-of line and optional newline +. ){1,4} # 1 to 4 lines. (0 will permit an empty match.) ) # Done capturing. /mx; # /m so that ^ anchor works, /x for comments.

        But that didn't work! I was vexed until I realized that looks an awful lot like "match 0 to 40 characters followed by $\ followed by an optional "n". So, then I tried:

        / ( # Assuming you want capture these lines. (?: # Group each line. ^ # Beginning of the line. .{0,40}$ # 0 to 40 characters followed by an end-of-line. \n? # An optional newline. ){1,4} # 1 to 4 lines. (0 will permit an empty match.) ) # Done capturing. /mx; # /m so that ^ anchor works, /x for comments.

        And that worked like a charm.

        That additional requirement did make the whole exercise more fun. There is another workaround. Sometime before I actually figured out why it was breaking, I tried (?:\n|\Z) and that worked as well but I thought it was ugly. So, I'm left wondering whether there is a better way around it than using /x and whitespace.

        Thanks for making this so much more entertaining. :-)

        Update: This was my 300th node! :-)

        -sauoq
        "My two cents aren't worth a dime.";
        
Re: How do I match lines of 40 characters long in a block of text?
by Zaxo (Archbishop) on Sep 25, 2002 at 19:25 UTC

    Repeated applications of index would do it:

    my $foo = "1234567890123456789012345678901234567890 1234567890123456789012345678901234567890 12345678901234567890123456789012345678901 123456789012345678901234567890123456789012 1234567890123456789012345678901234567890 1234567890123456789012345678901234567890 1234567890123456789012345678901234567890 1234567890123456789012345678901234567890"; my ($this, $prev, @short) = (0,0); while (@short < 4) { $this = index $foo, "\n", $prev; push @short, substr $foo, $prev, $this - $prev if $this - $prev <= 40; $prev = $this + 1; } { local $, = $/; print @short; }

    I expanded the data to show it's doing the right thing.

    After Compline,
    Zaxo

Re: How do I match lines of 40 characters long in a block of text?
by fglock (Vicar) on Sep 25, 2002 at 19:04 UTC

    if ($foo =~ m/(?:.{0,40}\n){0,4}/m) { print "ok" }

    /m is for "multi-line" match

    update: see bart below for why is this wrong . See sauoq answer instead.

    update: now I've got it (I did learn something today!) :

    The RE is:  /((?:[^\n]{0,40}\n){0,4})/

    or:  /((?:.{0,40}\n){0,4})/     # thanks sauoq!

    roughly meaning: group( (up-to-40 non-line-breaks, line-break) up-to-4 times)close-group

    Test code:

    $line = "x" x 38 . "\n"; sub test { ($res) = ($_[0] =~ /((?:[^\n]{0,40}\n){0,4})/); if ($res) { print "yes\n"; } else { print "no\n"; } } test( "x$line" x 4 ); # 39 x 4 test( "xx$line" x 4 ); # 40 x 4 test( "xxx$line" x 4 ); # 41 x 4 test( "xx$line" x 3 ); # 40 x 3 test( "x$line" x 5 ); # 39 x 5

    output:

    yes yes no yes yes

    Thanks thelenm and BrowserUk. sauoq got something very similar too.

    sauoq noted that  [^\n] is the same as a "dot".

      Won't that match 0 to 4 lines of (0 to 40 characters + nl)?

      Update:Of course it wil, cos that's what he asked for! D'oh!


      Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
        I'm still curious why he would want to match zero lines. I'd expect him to want to match at least one line.

        And: as soon as you ask for at least one line, without an "^" anchor, your first line might contain more than 40 characters, only the regex it will only grab the last 40 ones of that line!

        In short: it is definitely not a bad idea to add an anchor.

        /^((?:.{0,40}\n){0,4})/;
        or
        /^((?:.{0,40}\n){1,4})/m;
        The latter case can grab 1 to 4 whole lines anywhere in your text.

        Sure!

        I want to match up to 4 lines of up to 40 characters of text each

      The /m modifier only changes the behavior of "^" and "$" (whether they match at embedded newlines). Since your regex doesn't use either of those, the /m has no effect.

      -- Mike

      --
      just,my${.02}

      Your update isn't really any different than your first crack at it. A dot is equivalent to your character class, [^\n], as long as there is no /s modifier on the regex. A dot means "match any character except for a newline."

      -sauoq
      "My two cents aren't worth a dime.";
      
Re: How do I match lines of 40 characters long in a block of text?
by BrowserUk (Patriarch) on Sep 25, 2002 at 19:05 UTC

    Update: Answered the wrong question.

    This is tougher than it looks.

    Try m/(.{40}\n){4}/. It works for me...

    perl> $" = '' perl> $s = ((qq(@{[0..8]}) x 4).$/)x4 perl> print $s 012345678012345678012345678012345678 012345678012345678012345678012345678 012345678012345678012345678012345678 012345678012345678012345678012345678 perl> print 'Matched'.$/ if ( $s =~ m/(.{40}\n){4}/ ) perl> $s = ((qq(@{[0..9]}) x 4).$/)x4 perl> print $s 0123456789012345678901234567890123456789 0123456789012345678901234567890123456789 0123456789012345678901234567890123456789 0123456789012345678901234567890123456789 perl> print 'Matched'.$/ if ( $s =~ m/(.{40}\n){4}/ ) Matched perl>

    Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
      Sorry, I rephrased my question wrong. I wanted it to match lines from 0 to 40 characters but no more than 40 characters. thanks.
Re: How do I match lines of 40 characters long in a block of text?
by BrowserUk (Patriarch) on Sep 25, 2002 at 21:19 UTC

    This (finally) will do what you asked. It will test a string and determine if it contains 0 to 4 lines, and if it has 1 or more lines, that each of those lines is less than 40 characters.

    It wont search a scalar that has more than 4 lines and determine if there are 4 consecutive lines of less than fourty characters which I think is what Zaxo's solution does, and could be what you want, but it ain't what you asked for:)

    To put it another way, it will fail if the string has more than 4 lines or if any of the lines it contains are > 40 chars. Is that what you wanted? Should it be true if it contains no lines? Or rather, some chars with no newline?

    Oh, and its a 'pure regex' solution. (For some definition of pure:)

    #! perl -sw use strict; local $"='';#" my ($tmp1, $tmp2); for my $lines (0..5) { my $willmatch = ((qq(@{[0..9]}) x 4).$/)x $lines; my $half = int($lines/2); my $wontmatch = ((qq(@{[0..10]}) x 4).$/)x$half . 'X' . ((qq(@{[0. +.9]}) x 4).$/)x(1+$lines-$half); for ($willmatch, $wontmatch) { # if the number of lines inthe string is less than 4 if( ($tmp1=()=m/\n/mg) <=4 # and they are all < 40 chars and ( $tmp2=()=m/^(?:[^\n]{0,40}\n)/mg ) == $tmp1 ) { print "\nMatches! ($tmp1 lines with $tmp2 < 40 chars)\n"; } else { print "\nNo Match! ($tmp1 lines with only $tmp2 < 40 chars +)\n"; } print $_.$/; } } __END__

    For the results of the test click to