in reply to regular expression

Assuming the data you want is the organism information, and assuming that this information is always found in the last bracket:

$desc =~ /.+\[(.+)\]\s*$ (?{ $org = $1 or ''; })/x;

Breaking it down:
/.+ will greedy search until the last bracket,
\[(.+)\] will identify everything within the brackets,
\s*$ will make sure we're at the end of the line,
(?{ $org = $1 or ''; }) will store the data if any was found, and
/x tells the regex to ignore whitespace.

However, if you're doing this in a loop, I'd recommend the following:

my $enz, my $org; my $rx_enz_org = qr/ (.+)\[(.+)\]\s*$ (?{ $enx = $1 or ''; $org = $2 or ''; }) /x; <begin $desc loop> $desc =~ /$rx_enz_org/; <end loop>

qr// precompiles the regex, giving an extra bit of speed. Also, the regex saves both the enzyme and organism name (if you don't need the enzyme, then just use the previous example).

Hope this helps!
P.S. I haven't tested this, but I think it should work for you.

Replies are listed 'Best First'.
Re^2: regular expression
by CountZero (Bishop) on Mar 17, 2012 at 12:32 UTC
    Some comments:
    1. /.+ is not needed. You have anchored your regex at the end of the string, so there is no need to check if there is anything before the part you are looking for.
    2. \[(.+)\] will work if there are no other ] characters after next ]. Your pattern is greedy and will go all the way to the last ]. I would have written this either \[(.+?)\] (non-greedy solution) or \[([^\]]+) (everything up to the first closing bracket and you can drop the next \] in your regex.)

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics