shabird has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks! Hope you are doing well I am trying to read a file which has three columns and store the content in a hash. Now i want to store the first column data as keys and third column data as a values, here is the file

GeneName GeneType Regulation APOL4 protein_coding up CYP2C8 protein_coding down NAALADL2 protein_coding NA NANOS3 protein_coding up C20orf204 protein_coding up MIR429 miRNA up MIR200A miRNA down MIR200B miRNA down CFL1P4 processed_pseudogene down AC091607.1 processed_pseudogene up RPL19P20 processed_pseudogene up SREK1IP1P1 processed_pseudogene down CCT5P2 processed_pseudogene up CHTF8P1 processed_pseudogene NA FAR1P1 processed_pseudogene NA AC067940.1 processed_pseudogene up AL662791.1 lncRNA up

here is my code which sets the second column as a value but i want to set the third column as a value in hash

open FILE1, "data.txt" or die; my %hash; while (my $line=<FILE1>) { chomp $line; (my $word1,my $word2) = split( /\s+/, $line); $hash{$word1} = $word2; } @values = values(%hash); print @values;

How can i split the second column out of the file?

Replies are listed 'Best First'.
Re: Split a column
by choroba (Cardinal) on Apr 13, 2020 at 16:25 UTC
    You already split into three values, but you then throw the third one away. Don't do that:
    my ($first, $second, $third) = split ' ', $line;

    If you don't want to populate a variable that's never used, either replace it with undef in the assignment:

    my ($first, undef, $third) = split ' ', $line;

    or extract only the columns you're interested in using a list slice subscript:

    my ($first, $third) = (split ' ', $line)[0, 2];

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      i did that and it works but the output is irregular, i want the data to be in order but it is irregular like this. output:

      up up NA up down NA NA Regulation up down down up down up down up up up

      I want the regulation first and the data in order as they are in column, how can i do that?

        Hashes are (by definition) unordered so when you call values you're getting the values in a (pseudo)random order. If you want them back in the order of the keys you either need to store the keys off into an array as they come in, or use keys on your hash (which will return items in the same order as the corresponding values; alternately sort the keys and iterate over that to pull out and print the corresponding value).

        Update: Just to expand merging the suggestion above roughly something like this.

        my %hash; my @key_order; while( defined( my $line = <> ) ) { my ($first, $third) = (split ' ', $line)[0, 2]; $hash{ $first } = $third; push @key_order, $first; } ## In the order they appeared . . . for my $key (@key_order) { say $hash{ $key }; } ## Or in their random ordering but with the match . . . for my $key ( keys %hash ) { say qq{$key => $hash{ $key }}; }

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

        Hashes in Perl are unordered. You can use Hash::Ordered instead. See also the module's documentation for other options and their comparison.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: ead a file which has three columns and store the content in a hash
by BillKSmith (Monsignor) on Apr 16, 2020 at 15:32 UTC
    The array-of-arrays returned by the module Text::CSV_XS is a more appropriate data structure. It works well with other modules to compute and print your sums.
    use strict; use warnings; # node_id=11115481 use Text::CSV_XS qw( csv ); use List::MoreUtils qw(first_index true); my $aoa = csv(in=>'data.csv', sep_char=>"\t"); my $header = shift @$aoa; use constant REGULATION => first_index {/^Regulation$/} @$header; printf "Number of up is: %d\n" ."Number of down is: %d\n" ."Number of NA is: %d\n", map {count($_)} qw(up down NA); exit(0); sub count{ my $Regulation = $_[0]; local $_; return true {$_->[REGULATION] eq $Regulation } @$aoa; }

    OUTPUT:

    Number of up is: 9 Number of down is: 5 Number of NA is: 3
    Bill
Re: ead a file which has three columns and store the content in a hash
by leszekdubiel (Scribe) on Apr 14, 2020 at 18:24 UTC

    `cat file` is maybe slower, maybe not good style, but gives you list of lines from without having to bohter about open, close and so on. But if you need such precise control -- let somebody else do it -- that is Path::Tiny.

    #!/usr/bin/perl -CSDA use utf8; use Modern::Perl; no warnings qw{uninitialized}; use Data::Dumper; use Path::Tiny; my %myhash = map { ($$_[0], $$_[2]) } map { [split /\s+/, $_] } `cat myfile`; # maybe slower, but don't care +about open # path("myfile")->lines_utf8; # very reliable print Dumper(\%myhash); result: $VAR1 = { 'AC067940.1' => 'up', 'CCT5P2' => 'up', 'GeneName' => 'Regulation', 'C20orf204' => 'up', 'NANOS3' => 'up', 'MIR200A' => 'down', 'SREK1IP1P1' => 'down', 'AC091607.1' => 'up', 'CYP2C8' => 'down', 'AL662791.1' => 'up', 'FAR1P1' => 'NA', 'MIR200B' => 'down', 'CHTF8P1' => 'NA', 'RPL19P20' => 'up', 'MIR429' => 'up', 'APOL4' => 'up', 'CFL1P4' => 'down', 'NAALADL2' => 'NA' };

    counting:

    #!/usr/bin/perl -CSDA use utf8; use Modern::Perl; no warnings qw{uninitialized}; use Data::Dumper; use Path::Tiny; my %count; $count{$_}++ for map { $$_[2] } map { [split /\s+/, $_] } path("myfile")->lines_utf8; print Dumper(\%count); $VAR1 = { 'down' => 5, 'NA' => 3, 'up' => 9, 'Regulation' => 1 };
      `cat file` is maybe slower, maybe not good style, but gives you list of lines from without having to bohter about open, close and so on.

      It also doesn't catch any errors that might happen, may have issues regarding encoding, and isn't portable. And with variables interpolated into the backticks, it becomes a security risk. So yes, it's not good style. Path::Tiny is definitely the better solution.