in reply to Regex / "Sophisticated" End of Line

I don't exactly understand what are you trying to do. Could you tell us what do you want to extract from the line

  ACADM, Homo sapiensacyl-Coenzyme A dehydrogenase, C-4 to C-12 s +traight chain NP_001120800.1425 aa

Is it acyl-Coenzyme or NP_001120800.1425?

Replies are listed 'Best First'.
Re^2: Regex / "Sophisticated" End of Line
by nofutur45 (Initiate) on Oct 27, 2010 at 01:17 UTC

    Hi zwon,

    Thank you for your reply. In your example, it would be NP_001120800.1425

    .

    The following script by james2vegas - with a little modification - solved my problem:

    elsif (m/, Homo sapiens/) { my ($human) = m/((?:XP_|NP_)[\d. ]+)\s+/; $human = $1; print OUTFILE $1 . "\t";

    So it works in two parts: First get the line which includes ", Homo sapiens", and then look for the NP_ or XP_ combination

    I was curious if there is a way to pick up, say, the second "compact word" (I do not know how to say this properly, but a series of non-space characters) from the end. Another kind user provided an answer to this question, where in his solution he uses split by a space character, put the elements into an array and pick up the -2nd element, which is the second from the end.

    Thank you for your help, guys.