in reply to capturing words

Any advice is appreciated

There have been already good and solid solutions, but on a regex thread, I can't just sit on my hands ;-)

I like johngg's solution and tried to fancify his code further:

... my $data = ' Timed out (reason: in while loop) ::expect_out(0,string) = > ::expect_out(1,string) = RF use CSWT#RF### dis qremote(MQSI.3PL846) RNAME1 : dis qremote(MQSI.3PL846) RNAME AMQ8409: Display Queue details.QUEUE(MQSI.3PL846)TYPE(QREMOTE)RNAME(MQ +SI.3PL846) No commands have a syntax error. AMQ8409: Display Queue details.QUEUE(MQSI.3PL944)TYPE(QREMOTE)RNAME(MQ +SI.3PL944) end2 : end '; my $match = qr/^AMQ8409 .+? (Q.+?(\w+)\)) .+? (R.+?\2\)) /mx; for( split /\n/, $data ) { print "$1 ----> $3\n" if /$match/ } ...

... but the question here is: Is there a speed requirement, e.g. do you have to parse some 100 MB and spit out the results in one second? Then, all the solutions so far (including mine) would look rather bad (slow), because they depend heavily on (.+?) or (.*?).

Regards

mwa

Replies are listed 'Best First'.
Re^2: capturing words
by johngg (Canon) on Nov 08, 2007 at 23:55 UTC
    I'm pleased you liked my solution and wanted to take it further but I think you've introduced a slight bugette. Because you have used .+? R.+? in your pattern it has matched from the first "R" it encounters after the "QUEUE(...)" sequence so the output from your script is actually

    Queue details.QUEUE(MQSI.3PL846) ----> REMOTE)RNAME(MQSI.3PL846) Queue details.QUEUE(MQSI.3PL944) ----> REMOTE)RNAME(MQSI.3PL944)

    I am not familiar with whatever application produced the text so I don't know if it is wise to rely on the "QUEUE" and "RNAME" being the same. By the same token, I don't know if "QUEUE" and "RNAME" can appear more than once in one line. If they could then a different approach with a global match might be appropriate. However, if they are unique in a line then you can avoid non-greedy matching.

    my $data = ' Timed out (reason: in while loop) ::expect_out(0,string) = > ::expect_out(1,string) = RF use CSWT#RF### dis qremote(MQSI.3PL846) RNAME1 : dis qremote(MQSI.3PL846) RNAME AMQ8409: Display Queue details.QUEUE(MQSI.3PL846)TYPE(QREMOTE)RNAME(MQ +SI.3PL846) No commands have a syntax error. AMQ8409: Display Queue details.QUEUE(MQSI.3PL944)TYPE(QREMOTE)RNAME(MQ +SI.3PL944) end2 : end '; my $match = qr {(?mx) ^ AMQ8409 .+ (QUEUE\([^)]+\)) .+ (RNAME\([^)]+\)) }; for ( split /\n/, $data ) { print "$1 ----> $2\n" if /$match/ }

    This produces

    QUEUE(MQSI.3PL846) ----> RNAME(MQSI.3PL846) QUEUE(MQSI.3PL944) ----> RNAME(MQSI.3PL944)

    I hope this is of interest.

    Cheers,

    JohnGG