Getting columnwise substring from multiple lines

monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Dear Sirs,
I attempted to write a piece of script that takes data like below as example.

__DATA__
ABCD
EFGH
[download]

My intention is to create a Hash of Array. The arrays should contain columnwise substring of length (in this case) 2 from each lines (all line is of same length). Furthermore, the keys should contain the position (index) of the substring from the full line. Such that the result would be as follows:

$VAR1 = {
           '0' => [
                   'AB',
                    EF
                 ],

          '1' => [
                   'BC',
                   'FG'
                 ],

          '2' => [
                   'CD',
                   'GH',
                 ]
        };
[download]

The following script of mine are unable to obtain those results.

#!/usr/bin/perl -w
use strict;
use Data::Dumper;

my $enum_size =0;
my %hash;

my $sub_length = 2;
my $lmer;

while( <DATA>  ) {
 chomp;
 my @array;

 $enum_size = length ($_)  - $sub_length +1;

 for (my $j =0 ;$j <$enum_size ;$j++)
 {
   $lmer = substr ($_, $j, $sub_length);
   push @array, $lmer;
   $hash{$j}=[@array];
 }

}

 print Dumper \%hash;
 #Then do sth with the %hash
[download]

Please advice how can I approach this problem. Thanks so much beforehand.
And Merry Christmas to you all too!

Regards,
Edward

Comment on Getting columnwise substring from multiple lines Select or Download Code

Replies are listed 'Best First'.
Re: Getting columnwise substring from multiple lines by BrowserUk (Patriarch) on Dec 23, 2004 at 06:15 UTC
Update: It just struck me that using a hash for this is silly. String offsets are integers that run from 0! `#! perl -slw use strict; use Data::Dumper; my @hash; while( my $line = <DATA> ) { chomp $line; push @{ $hash[ $_ ] }, substr $line, $_, 2 for 0 .. length( $line ) -2; } print Dumper \@hash; __DATA__ ABCDEFGHIJKLM abcdefghijklm NOPQRSTUVWXYZ nopqrstuvwxyz` [download] Ignore the code below. `#! perl -slw use strict; use Data::Dumper; my %hash; while( my $line = <DATA> ) { chomp $line; push @{ $hash{ $_ } }, substr $line, $_, 2 for 0 .. length( $line ) -2; } print Dumper \%hash; __DATA__ ABCDEFGHIJKLM abcdefghijklm NOPQRSTUVWXYZ nopqrstuvwxyz` [download] Examine what is said, not who speaks. The end of an era! "But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen "Think for yourself!" - Abigail "Time is a poor substitute for thought"--theorbtwo "Efficiency is intelligent laziness." -David Dunham "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon	[reply] [d/l] [select]
Re: Getting columnwise substring from multiple lines by Zaxo (Archbishop) on Dec 23, 2004 at 06:32 UTC
You can push directly onto the hash value, `use Data::Dumper; my %hash; my $sub_length = 2; while (<DATA>) { chomp; for my $j (0 .. length - $sub_length) { push @{$hash{$j}}, substr $_, $j, $sub_length; } } print Dumper \%hash; __DATA__ ABDC EFGH` [download] which does what you want. I've eliminated intermediate steps and temporary variables, and replaced the C-style loop with a more perly one. The code will work even if the data lines are not all the same length. Note that the length function is called on $_ by default. Your code appears to overwrite the results from previous data lines each time. Is that the result you see? After Compline, Zaxo	[reply] [d/l]
Re: Getting columnwise substring from multiple lines by Skeeve (Parson) on Dec 23, 2004 at 08:39 UTC
1. TMTOWTDI 2. I love regular expressions `use Data::Dumper; my %hash; my $sub_length = '.' x 2; while (<DATA>) { chomp; # remove the chomp. It's unnecessary $hash{$.-1}=[ grep /$sub_length/o, split /($sub_length)/o ]; } print Dumper \%hash; __DATA__ ABDC EFGH` [download] Update: I just noticed, that my first version fails if the real line length is not a multiple of the wanted length. So I had to put in the grep-regex :-( Update 2: I alos like obfuscation. So here we go: `use Data::Dumper; my %hash; my $sub_length = '.' x 2; $hash{$.-1}=[ grep /$sub_length/o, split /($sub_length)/o ] while <DAT +A>; print Dumper \%hash; __DATA__ ABDCX EFGHX` [download] `$\=~s;s.;q^\|D9JYJ^^qq^\//\\\///^;ex;print`	[reply] [d/l] [select]
Re: Getting columnwise substring from multiple lines by sasikumar (Monk) on Dec 23, 2004 at 06:48 UTC
Hi To Make it same way as that of your code looks `#!/usr/bin/perl -w use strict; use Data::Dumper; my $enum_size =0; my %hash; my $sub_length = 2; my $lmer; my $i=0; open (DATA,"<C:\\temp.txt"); my @temp=<DATA>; chomp $temp[$i]; $enum_size = length ($temp[$i]) - $sub_length +1; for (my $j =0 ;$j <$enum_size ;$j++) { my @array; for($i=0;$i<@temp;$i++){ chomp $temp[$i]; $lmer = substr ($temp[$i], $j, $sub_length); push @array, $lmer; print "Array $j:@array"; print "\n"; } $hash{$j}=[@array]; } print Dumper \%hash;` [download] Thanks Sasi Kumar	[reply] [d/l]