Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Creating a Hash using only one column in an imported data file

by Koda1234 (Initiate)
on Feb 13, 2017 at 17:29 UTC ( [id://1181904]=perlquestion: print w/replies, xml ) Need Help??

Koda1234 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm very new to perl and I'm having a very difficult time organizing my script so that I can get it to do what I want it to do. Right now, I have a data file that is delimited by space, and has 12 columns, and 84000 rows. The only column that I care about is the 9th column. I am trying to organize the information in that column so that I can "count" the number of values given a conditional if statement (i.e. is the value in the list greater than 2.0, 3.0,...and so on.).

My issue is this. While creating a hash, I know I am supposed to assign a "key" to a "value". How do I specify a value that is >= 2 for example, assuming i'm not going to manually calculate the values greater than 2? And how do I get the hash to pull the information from the column, in my file.

  • Comment on Creating a Hash using only one column in an imported data file

Replies are listed 'Best First'.
Re: Creating a Hash using only one column in an imported data file
by choroba (Cardinal) on Feb 13, 2017 at 17:41 UTC
    I'm not sure I understand you. Does this do what you want? It uses split to extract the column, and counts how often it's greater than 1, 2, etc. to 10. This is done by a common technique: by incrementing a hash value associated to the number to which we're comparing.

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; *ARGV = *DATA{IO} unless @ARGV; my %greater_than; while (<>) { my $col9 = (split)[8]; $col9 > $_ and ++$greater_than{$_} for 1 .. 10; } for my $num (sort { $a <=> $b } keys %greater_than) { say "$greater_than{$num} values in column 9 were greater than $num +"; } __DATA__ 0 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 12 5 5 5 5 5 5 5 5 5 5 5 5

    Update: Explanation expanded.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Creating a Hash using only one column in an imported data file
by 1nickt (Canon) on Feb 13, 2017 at 18:30 UTC

    Hi Koda1234, welcome to the monastery and to Perl, the One True Religion.

    In Perl to filter a list of values down to a smaller list of only the elements matching a certain condition, use grep.

    use strict; use warnings; use feature 'say'; my @col9 = map { (split)[8] } <DATA>; foreach my $test ( 1, 9, 42, 666 ) { my $count = scalar grep { $_ >= $test } @col9; say sprintf "%d values were >= %d", $count, $test; } __DATA__ 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 42 10 11 12 1 2 3 4 5 6 7 8 42 10 11 12 1 2 3 4 5 6 7 8 42 10 11 12 1 2 3 4 5 6 7 8 1 10 11 12
    Output:
    $ perl 1181904.pl 7 values were >= 1 6 values were >= 9 3 values were >= 42 0 values were >= 666

    See also:

    • split for splitting text strings
    • map for transforming one list into another
    • scalar for counting how many elements are in a list
    • sprintf for creating strings containing changing values
    • __DATA__ for including a data "file" inside your program code

    Hope this helps!


    The way forward always starts with a minimal test.

      Thank you very much!! This was very helpful information. However, my data file is way too large to be able to use inside the code. I tried running it with the format below, and i'm getting 0 values are greater than x for all of the elements. Is there something wrong with the way i'm opening the file?

      open (IN, "<$ARGV[0]") || die ("Cannot open $ARGV[0]: $!"); @MyData = <IN>; use strict; use warnings; use feature 'say'; my @col9 = map {(split)[8]} <IN>; foreach my $test (2,3,4,5,6,7,8,9) { my $count =scalar grep {$_ >= $test} @col9; say sprintf "%d values were >= %d", $count, $test; }

        Hi Koda1234,

        The preferred (because it's safest ) way to open a file is the "three-argument form" (see open [suggestion: as a beginner, read the docs for the various functions; don't just copy examples you may see in the wild ]).

        Also:

        • Check that you got any input before using it.
        • $! will report the cause of open die-ing, but you can make your own check of the file so you can use your own error message.
        • Use while to read from your filehandle one line at a time, so even if it's big it won't fill your memory.
        • Use chomp to trim the newline character off the end of the line. Doesn't matter in your example, but it will soon enough...
        • Declare your array outside the while loop and use push to add the values to it as you split the lines.
        use strict; use warnings; use feature 'say'; my $filename = $ARGV[0] or die "You must supply a filename"; -f $filename or die "You must supply the name of a file that exists!"; open my $IN, '<', $filename or die "Can't open < $filename: $!"; my @col9; while ( my $line = <$IN> ) { chomp $line; push @col9, (split / /, $line)[8]; } close $IN or die "Can't close $filename: $!"; foreach my $test ( 1, 9, 42, 666 ) { my $count = scalar grep { $_ >= $test } @col9; say sprintf "%d values were >= %d", $count, $test; } __END__

        Hope this helps!


        The way forward always starts with a minimal test.
Re: Creating a Hash using only one column in an imported data file
by CountZero (Bishop) on Feb 14, 2017 at 14:54 UTC
    I would deal with the datafile as a database and use SQL and its aggregate functions to condense that 9th field into a list of unique numbers with the count of them Then it becomes trivially easy to answer your question.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1181904]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-20 01:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found