Hello shabird,
Your regex says: match one or more word characters, followed by one or more non-word characters, followed immediately by a newline; and return the characters matched minus the newline. This won’t work.
What you need is a way to uniquely identify the IDs. From the file contents shown, it looks as though each ID is immediately preceded by a > character. If so, you could use something like this:
#! perl use strict; use warnings; use Data::Dump; my @matches; push @matches, mysub($_) for <DATA>; dd \@matches; sub mysub { return shift =~ / > (\S+) \s /gx; } __DATA__ >NM_030643.4 Homo sapiens apolipoprotein L4 (APOL4) GAGGTGCTGGGGAGCAGCGTGTTTGCTGTGCTTGATTGTGAGCTGCTGGGAAGTTGTGACTTTCATTTTA CCTTTCGAATTCCTGGGTATATCTTGGGGGCTGGAGGACGTGTCTGGTTATTATATAGGTGCACAGCTGG >NM_001198855.1 Homo sapiens cytochrome P450 family 2 subfamily C memb +er 8 (CYP2C8) ACATGTCAAAGAGACACACAC >NR_029834.1 Homo sapiens microRNA 200a (MIR200A), microRNA CCGGGCCCCTGTGAGCATC >AC067940.1 Homo sapiens clone RP11-818E9, LOW-PASS SEQUENCE SAMPLING AAATACAACTTTAAATCAAAACGGTAAAAATTCCACTCTTTCATACTAACTTCAAAAGTATTTGCTTTAA AAAAAAAGNNNNNNNNN
Output:
23:46 >perl 2038_SoPW.pl ["NM_030643.4", "NM_001198855.1", "NR_029834.1", "AC067940.1"] 23:46 >
Hope that helps,
| Athanasius <°(((>< contra mundum | Iustus alius egestas vitae, eros Piratica, |
In reply to Re: Extracting string and numbers from a file
by Athanasius
in thread Extracting string and numbers from a file
by shabird
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |