patgas has asked for the wisdom of the Perl Monks concerning the following question:

I'm extracting some information from a line in a data file, and I came up with following regex to do it. I have two questions: What would break it, and why doesn't $title have the space in between END OF CENTURY and <BASIC> ? I'd have thought that using the negated character class to grab END OF CENTURY would take everything up until the next <, but it knows to leave out the space immediately before.

$_ = "(KONAMI ORIGINAL) END OF THE CENTURY <BASIC> / NO.9"; m|^\(([^\)]*)\) ([^<]*) <([^>]*)> / (.*)$|; ( $source, $title, $mode, $artist ) = ( $1, $2, $3, $4 ); print ">$source< >$title< >$mode< >$artist<\n";

Replies are listed 'Best First'.
Re: Break my regex, please
by davorg (Chancellor) on Oct 30, 2001 at 20:17 UTC

    The space is used to match the space in you regex between the second and third set of parentheses.

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you don't talk about Perl club."

Re: Break my regex, please
by tommyw (Hermit) on Oct 30, 2001 at 20:20 UTC

    The second questions easier to answer: the spaces in your pattern match the spaces in the text. So if you want the $title to include the surrounding spaces, use: m|^\(([^\)]*)\)([^<]*)<([^>]*)> / (.*)$|;

    As to what could break it, that depends on what you're trying to do :-). Anything not in that format wouldn't behave the way you expect. Particularly if the title happens to include a < character.

    --
    Tommy
    Too stupid to live.
    Too stubborn to die.

Re: Break my regex, please
by doc (Scribe) on Oct 30, 2001 at 20:32 UTC

    You are using literal spaces to match literal spaces. Adding/removing a single extra space will destroy the match, as will the presence of extra parentheses These are a few of the things that break your regex:

    (KONAMI ORIGINAL) END OF THE CENTURY <BASIC> / NO.9 (KONAMI ORIGINAL) END OF THE CENTURY <BASIC>/ NO.9 (KONAMI ORIGINAL) END OF THE CENTURY<BASIC> / NO.9 (KONAMI ORIGINAL)END OF THE CENTURY <BASIC> / NO.9 (KONAMI ORIGINAL (no 9) ) END OF THE CENTURY <BASIC> / NO.9

    using \s+ where one or more spaces is likely and \s* where 0 or more are OK is better if humans are involved in the typing!

    doc

(jeffa) Re: Break my regex, please
by jeffa (Bishop) on Oct 30, 2001 at 20:21 UTC
    Lots could break this - you'll just have to test it more.

    As for that space, the reason why you are not grabbing it in $3 is because you are specifying a space before the slash. Try this: (altered to be strict compliant)

    $_ = "(KONAMI ORIGINAL) END OF THE CENTURY <BASIC> / NO.9"; my ($source,$title,$mode,$artist) = m|^\(([^\)]*)\) ([^<]*)<([^>]*)> / + (.*)$|; print ">$source< >$title< >$mode< >$artist<\n";
    Also, i recommend using \s with a * or a + instead of literal spaces in your regex.

    jeffa