my %h;
while (<>) {
chomp;
my @rec = split;
$h{$rec[???]} = [ @rec[???, ???, ???] ];
}
Replace the question marks with the appropriate indexes. | [reply] [d/l] |
That data looks rather familiar. Are you a collegue of nofutur45? :-)
Perhaps my solution to his problem (5 minutes ago) may help you.
| [reply] |
| [reply] |
The entire text file looks something similar like this
3 9606 34 ACADM 187960098 NP_001120800.1
5 9606 37 ACADVL 4557235 NP_000009.1
6 9615 489421 ACAT1 73955189 XP_546539.2
I know how to read get each line as the input. After this, i should store all the desired columns in a hash table with the line number as the hash table index and to perform some string matching operations....
| [reply] |
my @a;
while (<>) {
chomp;
my @rec = split;
push @a, [ @rec[???, ???, ???] ];
}
i should store all the desired columns
You keep saying you only want certain columns, yet you don't say which. Again, just use the index of the columns you want for the question marks.
| [reply] [d/l] |
while($line=<>) {
($key,$num1,$num2,$string,$num3,$stringnum) = split(/\s/,$line);
$somehash{"$key"}{"$num1"}{"$num2"}{"$string"}{"$num3"}= $stringnum;
}
that puts the data into a "hash", but probably not what you want.
whether you use a hash or array structure largely depends on the data available and the logic/processing required. sequential processing and lack of a random access key lends itself to an array structure. when you have a good logical random access key (not a record sequence number) and need to access the records non-sequentially, use a hash. a hash structure, or even a mix of hash and array structure may be suitable. but exactly what structure do you want? both approaches could be out the window if you have millions of records in the file, whereby some much smarter arrangement would be required to achieve the logic/processing required.
speaking of which..what is the required logic/processing for these records?
the hardest line to type correctly is: stty erase ^H
| [reply] [d/l] |
You can try
cut -d"\t" -f1,4,7 file_name > subset_columns
for e.g. if you need first, fourth and seventh columns in a tab delimited file in a Linux | Cygwin command line. | [reply] [d/l] |
Can anyone tell me , how to read only the specified required columns...and store into a hash table...
Your task specification is incomplete. We don't know which columns are the required ones, and we have no idea what kind of data structure you have in mind.
But I'll make a wild guess that the fifth element ('187960098') is going to be the index or key into the hash, and that you want to store the sixth element ('NP_001120800.1') as the value. In that case, the code would be
#!/usr/bin/perl
#
#
use common::sense;
use Data::Dumper;
{
my %h;
while(<DATA>) {
my @f = split;
$h{$f[4]} = $f[5];
}
print Dumper ( \%h );
}
__DATA__
3 9606 34 ACADM 187960098 NP_001120800.1
When run, this produces
$ perl -w 867594.pl
$VAR1 = {
'187960098' => 'NP_001120800.1'
};
$
QED.
Alex / talexb / Toronto
"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds
| [reply] [d/l] [select] |