in reply to regular expression
Assuming the data you want is the organism information, and assuming that this information is always found in the last bracket:
$desc =~ /.+\[(.+)\]\s*$ (?{ $org = $1 or ''; })/x;
Breaking it down:
/.+ will greedy search until the last bracket,
\[(.+)\] will identify everything within the brackets,
\s*$ will make sure we're at the end of the line,
(?{ $org = $1 or ''; }) will store the data if any was found, and
/x tells the regex to ignore whitespace.
However, if you're doing this in a loop, I'd recommend the following:
my $enz, my $org; my $rx_enz_org = qr/ (.+)\[(.+)\]\s*$ (?{ $enx = $1 or ''; $org = $2 or ''; }) /x; <begin $desc loop> $desc =~ /$rx_enz_org/; <end loop>
qr// precompiles the regex, giving an extra bit of speed. Also, the regex saves both the enzyme and organism name (if you don't need the enzyme, then just use the previous example).
Hope this helps!
P.S. I haven't tested this, but I think it should work for you.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: regular expression
by CountZero (Bishop) on Mar 17, 2012 at 12:32 UTC |