capturing words

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have the following input file which contains :

 
Timed out (reason: in while loop)
::expect_out(0,string) = >
::expect_out(1,string) = RF use CSWT#RF###
dis qremote(MQSI.3PL846) RNAME1 : dis qremote(MQSI.3PL846) RNAME
AMQ8409: Display Queue details.QUEUE(MQSI.3PL846)TYPE(QREMOTE)RNAME(MQ
+SI.3PL846)
No commands have a syntax error.
AMQ8409: Display Queue details.QUEUE(MQSI.3PL944)TYPE(QREMOTE)RNAME(MQ
+SI.3PL944)
end2 : end
[download]

From this input file I am trying to caputure the value QUEUE(#####) and RNAME(#####) from each line starts with AMQ8409 , I am able to get to the line and capture it but not sure how to split it or do any trick to only get the QUEUE and RNAME value , my output should look like :

QUEUE(MQSI.3PL846) ----- >  RNAME(MQSI.3PL846)
QUEUE(MQSI.3PL944) ------> RNAME(MQSI.3PL944)
[download]

Here is what I have so far :

open (INPUT, "out") || die "couldn't open the file!";
open (OUTPUT, ">outF") || die "couldn't open the file!";

   foreach $line (<INPUT>) {

        chomp $line;

if ( $line =~ "AMQ8409" ) {
         print OUTPUT "$line\n"; }


   }
   close(OUTPUT);
   close(INPUT);
[download]

However , this will catch the whole line.. Any advice is appreciated. Thanks

Comment on capturing words Select or Download Code

Replies are listed 'Best First'.
Re: capturing words by kyle (Abbot) on Nov 08, 2007 at 16:06 UTC
This should work. `my $match_queue = qr{ QUEUE # literal word 'QUEUE' $ # literal open paren (.?) # non-greedy capture of >=0 $ # literal close paren }xms; my $match_rname = qr{ RNAME # literal word 'RNAME' $ # literal open paren (.?) # non-greedy capture of >=0 $ # literal close paren }xms; if ( $line =~ /AMQ8409/ ) { my ($queue) = ($line =~ $match_queue); my ($rname) = ($line =~ $match_rname); printf OUTPUT "QUEUE(%s) ------> RNAME(%s)\n", $queue, $rname; }` [download] It could be quite a bit shorter, but I thought some `/x` clarity would be good.	[reply] [d/l] [select]
Re^2: capturing words by Anonymous Monk on Nov 08, 2007 at 16:13 UTC
Thanks guys ,, that should help a lot	[reply]
Re: capturing words by gamache (Friar) on Nov 08, 2007 at 16:01 UTC
Try: `while (<INPUT>) { if (/^AMQ8409 .* QUEUE$ ([^$]+) \) .* RNAME$ ([^$]+) \)/x) { print OUTPUT "QUEUE($1) ------> RNAME($2)\n"; } }` [download]	[reply] [d/l]
Re^2: capturing words by johngg (Canon) on Nov 08, 2007 at 16:17 UTC
Why not save a little typing and put the captures around the whole "QUEUE(...)" and "RNAME(...)"? `while ( <INPUT> ) { print OUTPUT qq{$1 ------> $2\n} if m{(?x) ^ AMQ8409 .? ( QUEUE .? \) ) .* ( RNAME .*? \) ) }; }` [download] You don't need to escape the closing parenthesis in your negated character class, BTW. Cheers, JohnGG	[reply] [d/l]
Re: capturing words by mwah (Hermit) on Nov 08, 2007 at 19:21 UTC
Any advice is appreciated There have been already good and solid solutions, but on a regex thread, I can't just sit on my hands ;-) I like johngg's solution and tried to fancify his code further: ... my $data = ' Timed out (reason: in while loop) ::expect_out(0,string) = > ::expect_out(1,string) = RF use CSWT#RF### dis qremote(MQSI.3PL846) RNAME1 : dis qremote(MQSI.3PL846) RNAME AMQ8409: Display Queue details.QUEUE(MQSI.3PL846)TYPE(QREMOTE)RNAME(MQ +SI.3PL846) No commands have a syntax error. AMQ8409: Display Queue details.QUEUE(MQSI.3PL944)TYPE(QREMOTE)RNAME(MQ +SI.3PL944) end2 : end '; my $match = qr/^AMQ8409 .+? (Q.+?(\w+)\)) .+? (R.+?\2\)) /mx; for( split /\n/, $data ) { print "$1 ----> $3\n" if /$match/ } ... [download] ... but the question here is: Is there a speed requirement, e.g. do you have to parse some 100 MB and spit out the results in one second? Then, all the solutions so far (including mine) would look rather bad (slow), because they depend heavily on (.+?) or (.*?). Regards mwa	[reply] [d/l]
Re^2: capturing words by johngg (Canon) on Nov 08, 2007 at 23:55 UTC
I'm pleased you liked my solution and wanted to take it further but I think you've introduced a slight bugette. Because you have used `.+? R.+?` in your pattern it has matched from the first "R" it encounters after the "QUEUE(...)" sequence so the output from your script is actually `Queue details.QUEUE(MQSI.3PL846) ----> REMOTE)RNAME(MQSI.3PL846) Queue details.QUEUE(MQSI.3PL944) ----> REMOTE)RNAME(MQSI.3PL944)` [download] I am not familiar with whatever application produced the text so I don't know if it is wise to rely on the "QUEUE" and "RNAME" being the same. By the same token, I don't know if "QUEUE" and "RNAME" can appear more than once in one line. If they could then a different approach with a global match might be appropriate. However, if they are unique in a line then you can avoid non-greedy matching. my $data = ' Timed out (reason: in while loop) ::expect_out(0,string) = > ::expect_out(1,string) = RF use CSWT#RF### dis qremote(MQSI.3PL846) RNAME1 : dis qremote(MQSI.3PL846) RNAME AMQ8409: Display Queue details.QUEUE(MQSI.3PL846)TYPE(QREMOTE)RNAME(MQ +SI.3PL846) No commands have a syntax error. AMQ8409: Display Queue details.QUEUE(MQSI.3PL944)TYPE(QREMOTE)RNAME(MQ +SI.3PL944) end2 : end '; my $match = qr {(?mx) ^ AMQ8409 .+ (QUEUE$[^)]+$) .+ (RNAME$[^)]+$) }; for ( split /\n/, $data ) { print "$1 ----> $2\n" if /$match/ } [download] This produces `QUEUE(MQSI.3PL846) ----> RNAME(MQSI.3PL846) QUEUE(MQSI.3PL944) ----> RNAME(MQSI.3PL944)` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]