Text::ParseWords regex doesn't work when text is too long?

edan has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

After spending quite a while running and re-running my program with perl -d, I finally narrowed down this problem I'm having with Text::ParseWords, whereby the comma-separated, quoted string I'm passing to parse_line doesn't get parsed (I get back an empty list).

Can someone please tell me why the following code behaves this way?

my $string = "'" . 'v' x 35_000 . "z'";
print "length of the string is: ", length($string), "\n";

my ($quote, $quoted);
($quote, $quoted) = $string =~ m/^(["'])(.*)\1/;
print "dot-star works!\n" if $quoted;

# a copy of part of the Text::ParseWords::parse_line() regex
($quote, $quoted) = $string =~ m/^(["'])((?:\\.|(?!\1)[^\\])*)\1/;
print "Text::ParseWords fails!\n" unless $quoted;
[download]

I don't know if the results are system-dependent, but If I shorten the string to 30,000 characters, the second reg-ex works, too.Is there some sort of size limit with zero-width negative look-ahead assertions or character-classes that I didn't know about?

Any ideas?

--
3dan

Comment on Text::ParseWords regex doesn't work when text is too long? Select or Download Code

Replies are listed 'Best First'.

Re: Text::ParseWords regex doesn't work when text is too long?
by PodMaster (Abbot) on May 11, 2003 at 17:08 UTC

Complex regular subexpression recursion limit (32766) exceeded

perldiag

/(?<!\\)"/

MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
I run a Win32 PPM repository for perl 5.6x+5.8x. I take requests.
** The Third rule of perl club is a statement of fact: pod is sexy.

[reply]
[d/l]

Re: Text::ParseWords regex doesn't work when text is too long? (fixes)
by tye (Sage) on May 11, 2003 at 17:44 UTC

This isn't too hard to fix:

    my( $quote, $quoted, $end )=  $string =~
        /(['"])((?:\\.|[^'"\\]+|(?!\1)['"])*)(\1?)/;
    die "Unclosed quote: $quote$quoted\n"
        if  $quote  &&  ! $end;
[download]

not

     /(['"])((?:\\.|[^\1\\]+)*)(\1?)/
[download]

[^\1]

If you have a string that contains a huge sequence of backquoted characters, then you might have to add a + to that part of the regex as well:

    /(['"])((?:(?:\\.)+|[^\1\\]+)*)(\1?)/
[download]

rather

    /(['"])((?:(?:\\.)+|[^'"\\]+|(?!\1)['"])*)(\1?)/
[download]

    "'" . '\vv'x35_000 . "z'"
[download]

updated

    my( $quote, $quoted );
    if(  $str =~ /(['"])/g  ) {
        my $beg= pos($str);
        $quote= $1;
        if(  $str !~ /(?<!\\)((?:\\\\)*)\Q$quote/g  ) {
            die "Unclosed quote: ", substr($str,$beg), $/;
        }
        my $end= pos($str);
        $quoted= substr( $str, $beg, $end-$beg-1 );
    }
[download]

Update: Thanks, merlyn. I knew that had failed in my previous testing but had also run into people thinking it should work enough times that when it "worked" in my test case that didn't test that part of it at all, I jumped to the wrong conclusion.

tye

[reply]
[d/l]
[select]

•Re: Re: Text::ParseWords regex doesn't work when text is too long? (fixes)

by merlyn (Sage) on May 11, 2003 at 18:47 UTC

[^\1\\]

In other words, in the words of the Inigo Montoya in Princess Bride, "I don't think that means what you think that means".

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

"\1" =~ /[^\1]/
[download]

"\1X" =~ /[^\1]/
[download]

[reply]
[d/l]
[select]

regex bottom line?

by edan (Curate) on May 12, 2003 at 09:58 UTC

So, assuming that I'll need to roll my own parse_line by modifying the regex... what regex will provide the same functionality but work for arbitrarily large strings?

Since I still don't really understand what /(?!\1)[^\\]/ does, I am having trouble with this... I reason that it should match anything that's not a quote (whichever quote was opened at the start of the match), but I don't see how it does this...

Should I use tye's first regex? I also don't get how /((?:\\.|[^'"\\]+|(?!\1)['"])*)/ works...
Does
/[^'"\\]+|(?!\1)['"]/
do the same thing as
/(?!\1)[^\\]/
?

3dan

[reply]
[d/l]
[select]

Re: regex bottom line? (bottom method)

by tye (Sage) on May 12, 2003 at 16:13 UTC

Re: Text::ParseWords regex doesn't work when text is too long?
by benn (Vicar) on May 11, 2003 at 17:16 UTC

:( Ben.

~~PS - I notice search.cpan.org has Text::ParseWords 3.1, while my builds have 3.21 ...has it been 'rolled back'?~~ or just in the CORE...doh!

Update or just listen to da PodMasta

[reply]