sundeep has asked for the wisdom of the Perl Monks concerning the following question:

There are different columns in the input file.. I want to read only a few columns and leave the other columns. There are many rows with these columns in the input file. I want to store these required columns in a hash table. Can anyone tell me , how to read only the specified required columns...and store into a hash table...

3 9606 34 ACADM 187960098 NP_001120800.1

  • Comment on Extract and read different columns from the file

Replies are listed 'Best First'.
Re: Extract and read different columns from the file
by ikegami (Patriarch) on Oct 27, 2010 at 03:47 UTC
    my %h; while (<>) { chomp; my @rec = split; $h{$rec[???]} = [ @rec[???, ???, ???] ]; }

    Replace the question marks with the appropriate indexes.

Re: Extract and read different columns from the file
by kcott (Archbishop) on Oct 27, 2010 at 03:53 UTC

    That data looks rather familiar. Are you a collegue of nofutur45? :-)

    Perhaps my solution to his problem (5 minutes ago) may help you.

    -- Ken

Re: Extract and read different columns from the file
by aquarium (Curate) on Oct 27, 2010 at 04:03 UTC
    Can you please show your attempt at doing this?..which would further clarify what structure you're trying to arrive at. That's much better than asking for somebody to do all your (home)work.
    the hardest line to type correctly is: stty erase ^H

      The entire text file looks something similar like this

      3 9606 34 ACADM 187960098 NP_001120800.1

      5 9606 37 ACADVL 4557235 NP_000009.1

      6 9615 489421 ACAT1 73955189 XP_546539.2

      I know how to read get each line as the input. After this, i should store all the desired columns in a hash table with the line number as the hash table index and to perform some string matching operations....

        with the line number as the hash table index

        huh, why not just use an array?

        my @a; while (<>) { chomp; my @rec = split; push @a, [ @rec[???, ???, ???] ]; }

        i should store all the desired columns

        You keep saying you only want certain columns, yet you don't say which. Again, just use the index of the columns you want for the question marks.

        while($line=<>) { ($key,$num1,$num2,$string,$num3,$stringnum) = split(/\s/,$line); $somehash{"$key"}{"$num1"}{"$num2"}{"$string"}{"$num3"}= $stringnum; }
        that puts the data into a "hash", but probably not what you want.
        whether you use a hash or array structure largely depends on the data available and the logic/processing required. sequential processing and lack of a random access key lends itself to an array structure. when you have a good logical random access key (not a record sequence number) and need to access the records non-sequentially, use a hash. a hash structure, or even a mix of hash and array structure may be suitable. but exactly what structure do you want? both approaches could be out the window if you have millions of records in the file, whereby some much smarter arrangement would be required to achieve the logic/processing required.
        speaking of which..what is the required logic/processing for these records?
        the hardest line to type correctly is: stty erase ^H
Re: Extract and read different columns from the file
by umasuresh (Hermit) on Oct 27, 2010 at 12:59 UTC
    You can try
     cut -d"\t" -f1,4,7 file_name > subset_columns
    for e.g. if you need first, fourth and seventh columns in a tab delimited file in a Linux | Cygwin command line.
Re: Extract and read different columns from the file
by talexb (Chancellor) on Oct 28, 2010 at 15:22 UTC
      Can anyone tell me , how to read only the specified required columns...and store into a hash table...

    Your task specification is incomplete. We don't know which columns are the required ones, and we have no idea what kind of data structure you have in mind.

    But I'll make a wild guess that the fifth element ('187960098') is going to be the index or key into the hash, and that you want to store the sixth element ('NP_001120800.1') as the value. In that case, the code would be

    #!/usr/bin/perl # # use common::sense; use Data::Dumper; { my %h; while(<DATA>) { my @f = split; $h{$f[4]} = $f[5]; } print Dumper ( \%h ); } __DATA__ 3 9606 34 ACADM 187960098 NP_001120800.1
    When run, this produces
    $ perl -w 867594.pl $VAR1 = { '187960098' => 'NP_001120800.1' }; $
    QED.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds