in reply to Re^2: Extacting lines where one column matches a name from a list of names
in thread Extacting lines where one column matches a name from a list of names

One feature of your example data here is that the field of interest (the "name" field) is always the first field in the record, i.e., always at the start of a string read from a file. (Update: This approach assumes that the  $rx_sep field separator pattern cannot possibly appear in a "name" field!) This anchor can be very useful. If you build a regex to match all the names "of interest" (see haukex's article Building Regex Alternations Dynamically), it's a one-pass process to read all records in a file and match and extract only those records of interest.

c:\@Work\Perl\monks>perl use strict; use warnings; my @names = qw(1000567/1 1000567/2 122854/1 1000574/2); my $rx_sep = qr{ , }xms; # adjust to match real field separator my ($rx_interesting) = map qr{ \A (?: $_) (?= $rx_sep) }xms, join q{ | }, map quotemeta, # proper! reverse sort # order! @names ; print "$rx_interesting \n"; # for debug my @data = ( '1083978/2,284224,284292,chrX,255,+,284224,284292,255,0,0,1,68,0', '122854/1,284224,284277,chrX,255,+,284224,284277,255,0,0,1,53,0', '641613/1,284224,284290,chrX,255,+,284224,284290,255,0,0,1,66,0', ); while (my $datum = shift @data) { print "interesting: >$datum< \n" if $datum =~ $rx_interesting; } __END__ (?msx-i: \A (?: 122854\/1 | 1000574\/2 | 1000567\/2 | 1000567\/1) (?= +(?msx-i: , )) ) interesting: >122854/1,284224,284277,chrX,255,+,284224,284277,255,0,0, +1,53,0<
Note that  $rx_sep has to be adjusted to match whatever field separator your data records actually use.

Update 1: I didn't notice that haukex already suggested this approach here. Oh well... At least you have a worked example :)

Update 2: I've noticed a stupid mistake in my code as originally posted. A part of the sequence of operations to build  $rx_interesting was incorrectly given as
    reverse sort
    map quotemeta,
The code has been corrected. The error (quotemeta-ing before sort-ing) should have made no difference in this particular application, but there are corner cases (update: in other potential applications) in which it would (although I'm unable to think of a good example of such a case ATM).


Give a man a fish:  <%-{-{-{-<