ead a file which has three columns and store the content in a hash

shabird has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks! Hope you are doing well I am trying to read a file which has three columns and store the content in a hash. Now i want to store the first column data as keys and third column data as a values, here is the file

GeneName    GeneType    Regulation
APOL4    protein_coding    up
CYP2C8    protein_coding    down
NAALADL2    protein_coding    NA
NANOS3    protein_coding    up
C20orf204    protein_coding    up
MIR429    miRNA    up
MIR200A    miRNA    down
MIR200B    miRNA    down
CFL1P4    processed_pseudogene    down
AC091607.1    processed_pseudogene    up
RPL19P20    processed_pseudogene    up
SREK1IP1P1    processed_pseudogene    down
CCT5P2    processed_pseudogene    up
CHTF8P1    processed_pseudogene    NA
FAR1P1    processed_pseudogene    NA
AC067940.1    processed_pseudogene    up
AL662791.1    lncRNA    up
[download]

here is my code which sets the second column as a value but i want to set the third column as a value in hash

open FILE1, "data.txt" or die;


my %hash;
while (my $line=&lt;FILE1&gt;) {

chomp $line;

(my $word1,my $word2) = split( /\s+/, $line);


$hash{$word1} = $word2;
}


@values = values(%hash);

print @values;
[download]

How can i split the second column out of the file?

Comment on ead a file which has three columns and store the content in a hash Select or Download Code

Replies are listed 'Best First'.
Re: Split a column by choroba (Cardinal) on Apr 13, 2020 at 16:25 UTC
You already split into three values, but you then throw the third one away. Don't do that: `my ($first, $second, $third) = split ' ', $line;` [download] If you don't want to populate a variable that's never used, either replace it with undef in the assignment: `my ($first, undef, $third) = split ' ', $line;` [download] or extract only the columns you're interested in using a list slice subscript: `my ($first, $third) = (split ' ', $line)[0, 2];` [download] `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^2: Split a column by shabird (Sexton) on Apr 13, 2020 at 16:35 UTC
i did that and it works but the output is irregular, i want the data to be in order but it is irregular like this. output: `up up NA up down NA NA Regulation up down down up down up down up up up` I want the regulation first and the data in order as they are in column, how can i do that?	[reply] [d/l]
Re^3: Split a column by Fletch (Bishop) on Apr 13, 2020 at 16:51 UTC
Hashes are (by definition) unordered so when you call values you're getting the values in a (pseudo)random order. If you want them back in the order of the keys you either need to store the keys off into an array as they come in, or use keys on your hash (which will return items in the same order as the corresponding values; alternately sort the keys and iterate over that to pull out and print the corresponding value). Update: Just to expand merging the suggestion above roughly something like this. `my %hash; my @key_order; while( defined( my $line = <> ) ) { my ($first, $third) = (split ' ', $line)[0, 2]; $hash{ $first } = $third; push @key_order, $first; } ## In the order they appeared . . . for my $key (@key_order) { say $hash{ $key }; } ## Or in their random ordering but with the match . . . for my $key ( keys %hash ) { say qq{$key => $hash{ $key }}; }` [download] The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l]
Re^3: Split a column by choroba (Cardinal) on Apr 13, 2020 at 17:08 UTC
Hashes in Perl are unordered. You can use Hash::Ordered instead. See also the module's documentation for other options and their comparison. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re^4: Split a column by shabird (Sexton) on Apr 13, 2020 at 17:45 UTC
Re^5: Split a column by Fletch (Bishop) on Apr 13, 2020 at 18:03 UTC
Re^5: Split a column by choroba (Cardinal) on Apr 13, 2020 at 18:00 UTC
Re^5: Split a column by AnomalousMonk (Archbishop) on Apr 13, 2020 at 18:44 UTC
Re: ead a file which has three columns and store the content in a hash by BillKSmith (Monsignor) on Apr 16, 2020 at 15:32 UTC
The array-of-arrays returned by the module Text::CSV_XS is a more appropriate data structure. It works well with other modules to compute and print your sums. `use strict; use warnings; # node_id=11115481 use Text::CSV_XS qw( csv ); use List::MoreUtils qw(first_index true); my $aoa = csv(in=>'data.csv', sep_char=>"\t"); my $header = shift @$aoa; use constant REGULATION => first_index {/^Regulation$/} @$header; printf "Number of up is: %d\n" ."Number of down is: %d\n" ."Number of NA is: %d\n", map {count($_)} qw(up down NA); exit(0); sub count{ my $Regulation = $_[0]; local $_; return true {$_->[REGULATION] eq $Regulation } @$aoa; }` [download] OUTPUT: `Number of up is: 9 Number of down is: 5 Number of NA is: 3` [download] Bill	[reply] [d/l] [select]
Re: ead a file which has three columns and store the content in a hash by leszekdubiel (Scribe) on Apr 14, 2020 at 18:24 UTC
`cat file` is maybe slower, maybe not good style, but gives you list of lines from without having to bohter about open, close and so on. But if you need such precise control -- let somebody else do it -- that is Path::Tiny. #!/usr/bin/perl -CSDA use utf8; use Modern::Perl; no warnings qw{uninitialized}; use Data::Dumper; use Path::Tiny; my %myhash = map { ($$_[0], $$_[2]) } map { [split /\s+/, $_] } `cat myfile`; # maybe slower, but don't care +about open # path("myfile")->lines_utf8; # very reliable print Dumper(\%myhash); result: $VAR1 = { 'AC067940.1' => 'up', 'CCT5P2' => 'up', 'GeneName' => 'Regulation', 'C20orf204' => 'up', 'NANOS3' => 'up', 'MIR200A' => 'down', 'SREK1IP1P1' => 'down', 'AC091607.1' => 'up', 'CYP2C8' => 'down', 'AL662791.1' => 'up', 'FAR1P1' => 'NA', 'MIR200B' => 'down', 'CHTF8P1' => 'NA', 'RPL19P20' => 'up', 'MIR429' => 'up', 'APOL4' => 'up', 'CFL1P4' => 'down', 'NAALADL2' => 'NA' }; [download] counting: `#!/usr/bin/perl -CSDA use utf8; use Modern::Perl; no warnings qw{uninitialized}; use Data::Dumper; use Path::Tiny; my %count; $count{$_}++ for map { $$_[2] } map { [split /\s+/, $_] } path("myfile")->lines_utf8; print Dumper(\%count); $VAR1 = { 'down' => 5, 'NA' => 3, 'up' => 9, 'Regulation' => 1 };` [download]	[reply] [d/l] [select]
Re^2: ead a file which has three columns and store the content in a hash by haukex (Archbishop) on Apr 14, 2020 at 18:28 UTC
`cat file` is maybe slower, maybe not good style, but gives you list of lines from without having to bohter about open, close and so on. It also doesn't catch any errors that might happen, may have issues regarding encoding, and isn't portable. And with variables interpolated into the backticks, it becomes a security risk. So yes, it's not good style. Path::Tiny is definitely the better solution.	[reply]