Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Sample:
xxx: No problem. (23 minutes ago) yyy: I'm new... just today (23 minutes ago)
I need a regex which will find all things (can be a name with spaces, letters, numbers, hyphens of any length) followed by a single colon, a few spaces followed by text and until it hits the closing ) after the ..ago).

Thanks for your help.

Replies are listed 'Best First'.
Re: need a complex regex
by BrowserUk (Patriarch) on Jul 30, 2003 at 07:14 UTC

    You don't say whether you are processing this line by line, or need to extract your bits from a multiline lump of data. You are also not at all clear as to which parts of the lines you wish to keep, and whether you want to capture them as 2 or 3 pieces.

    This assumes that you processing line by line and want 3 pieces from each line.

    if( $line =~ /^([^:]+): (.+) \((\d+) minutes ago\)$/ ) { my( $name, $text, $delay ) = ( $1, $2, $3 ); }

    The regex is saying

    1. Capture 1 or more non-':' characters between the start of the regex and the first colon into $1.
    2. Then insist on, but skip, a ': '
    3. The capture one or more characters after that space and upto the next sequence into $2.
    4. Match but skip a space preceding a literal '('

      followed by one or more digits (captured to $3),

      followed b y the literal text ' minutes ago)',

      followed by the end of line.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

Re: need a complex regex
by Anonymous Monk on Jul 30, 2003 at 12:43 UTC
    Instead of only using a regex.
    my($this, $that) = split ':', $line, 2; my $other = (split '\(', $that)[1]; $other =~ s/\)$//;
    Just a thought.
Re: need a complex regex
by Anonymous Monk on Jul 30, 2003 at 06:47 UTC
    I forgot to mention, the text between the space after the colons and before the space prior to the circle things can be pretty much any value (text, numbers, spaces, hyphens, etc) as well.