Assuming the data you want is the organism information, and assuming that this information is always found in the last bracket:
$desc =~ /.+\[(.+)\]\s*$ (?{ $org = $1 or ''; })/x;
Breaking it down:
/.+ will greedy search until the last bracket,
\[(.+)\] will identify everything within the brackets,
\s*$ will make sure we're at the end of the line,
(?{ $org = $1 or ''; }) will store the data if any was found, and
/x tells the regex to ignore whitespace.
However, if you're doing this in a loop, I'd recommend the following:
my $enz, my $org; my $rx_enz_org = qr/ (.+)\[(.+)\]\s*$ (?{ $enx = $1 or ''; $org = $2 or ''; }) /x; <begin $desc loop> $desc =~ /$rx_enz_org/; <end loop>
qr// precompiles the regex, giving an extra bit of speed. Also, the regex saves both the enzyme and organism name (if you don't need the enzyme, then just use the previous example).
Hope this helps!
P.S. I haven't tested this, but I think it should work for you.
In reply to Re: regular expression
by muppetjones
in thread regular expression
by ssharma
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |