Alessandro has asked for the wisdom of the Perl Monks concerning the following question:
Hello monks, here is my question. I have 2 files, one containing a list of IDs that looks like this:
And another one that looks like this, it is a csv with tab as field delimiter (which I symbolize here with " \t " as somehow I can't add a real tab here, also please note that I did not forget a tab in "no match", there are really fields that do contain white spaces):GSAD1234 GSAD2345 GSAD4567
It is worth mentioning this second file contains more than 50 000 lines.GSAD1234 \t 123 \t 45 \t no match \t fungus \t protein_x GSAD5678 \t 123 \t 51 \t plant \t fungus \t protein_y \t transporter
I would like to extract from the second file the lines corresponding to the IDs from the first file. So here the desired output would be:
How do I do that? I had thought of reading the second file into a hash with the IDs as key and the rest of the fields as values but I can't find a way to do it due to the multiple fields per line. So far I have read the 2 files into arrays and tried to match the lines but it doesn't work and again, I am not sure it is the right strategy. Here is the code that seems to simply output the whole csv file:GSAD1234 \t 123 \t 45 \t no match \t fungus \t protein_x
I would be grateful for any help.#!/usr/bin/perl use warnings; use strict; use Text::CSV; use File::Slurp; my $csv = Text::CSV->new({ sep_char => '\t' }); #end of preparation #read data my $file = $ARGV[0] or die "Need to get CSV file on the command line\n +"; open(my $data,'<',$file) or die "Could not open file \n"; chomp (my @strings = <$data>); close $data; # read ID list my $id = 'id.txt'; my @ids = read_file("$id", chomp =>1); foreach(@ids) { my @matches = grep(/^($_)/,@strings); print join ",",@matches; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Extracting lines starting with a pattern from an array
by choroba (Cardinal) on Dec 16, 2015 at 17:35 UTC | |
|
Re: Extracting lines starting with a pattern from an array
by CountZero (Bishop) on Dec 16, 2015 at 21:27 UTC | |
by Alessandro (Acolyte) on Dec 17, 2015 at 16:15 UTC | |
by u65 (Chaplain) on Dec 18, 2015 at 11:37 UTC | |
by hippo (Archbishop) on Dec 18, 2015 at 12:19 UTC | |
|
Re: Extracting lines starting with a pattern from an array
by GotToBTru (Prior) on Dec 16, 2015 at 17:44 UTC | |
|
Re: Extracting lines starting with a pattern from an array
by Laurent_R (Canon) on Dec 16, 2015 at 22:13 UTC |