monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Dear Sirs,
I attempted to write a piece of script that takes data like below as example.
__DATA__ ABCD EFGH
My intention is to create a Hash of Array. The arrays should contain columnwise substring of length (in this case) 2 from each lines (all line is of same length). Furthermore, the keys should contain the position (index) of the substring from the full line. Such that the result would be as follows:
$VAR1 = { '0' => [ 'AB', EF ], '1' => [ 'BC', 'FG' ], '2' => [ 'CD', 'GH', ] };
The following script of mine are unable to obtain those results.
#!/usr/bin/perl -w use strict; use Data::Dumper; my $enum_size =0; my %hash; my $sub_length = 2; my $lmer; while( <DATA> ) { chomp; my @array; $enum_size = length ($_) - $sub_length +1; for (my $j =0 ;$j <$enum_size ;$j++) { $lmer = substr ($_, $j, $sub_length); push @array, $lmer; $hash{$j}=[@array]; } } print Dumper \%hash; #Then do sth with the %hash
Please advice how can I approach this problem. Thanks so much beforehand.
And Merry Christmas to you all too!
Regards,
Edward

Replies are listed 'Best First'.
Re: Getting columnwise substring from multiple lines
by BrowserUk (Patriarch) on Dec 23, 2004 at 06:15 UTC

    Update: It just struck me that using a hash for this is silly. String offsets are integers that run from 0!

    #! perl -slw use strict; use Data::Dumper; my @hash; while( my $line = <DATA> ) { chomp $line; push @{ $hash[ $_ ] }, substr $line, $_, 2 for 0 .. length( $line ) -2; } print Dumper \@hash; __DATA__ ABCDEFGHIJKLM abcdefghijklm NOPQRSTUVWXYZ nopqrstuvwxyz

    Ignore the code below.

    #! perl -slw use strict; use Data::Dumper; my %hash; while( my $line = <DATA> ) { chomp $line; push @{ $hash{ $_ } }, substr $line, $_, 2 for 0 .. length( $line ) -2; } print Dumper \%hash; __DATA__ ABCDEFGHIJKLM abcdefghijklm NOPQRSTUVWXYZ nopqrstuvwxyz

    Examine what is said, not who speaks.        The end of an era!
    "But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
    "Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo         "Efficiency is intelligent laziness." -David Dunham
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: Getting columnwise substring from multiple lines
by Zaxo (Archbishop) on Dec 23, 2004 at 06:32 UTC

    You can push directly onto the hash value,

    use Data::Dumper; my %hash; my $sub_length = 2; while (<DATA>) { chomp; for my $j (0 .. length - $sub_length) { push @{$hash{$j}}, substr $_, $j, $sub_length; } } print Dumper \%hash; __DATA__ ABDC EFGH
    which does what you want. I've eliminated intermediate steps and temporary variables, and replaced the C-style loop with a more perly one. The code will work even if the data lines are not all the same length. Note that the length function is called on $_ by default.

    Your code appears to overwrite the results from previous data lines each time. Is that the result you see?

    After Compline,
    Zaxo

Re: Getting columnwise substring from multiple lines
by Skeeve (Parson) on Dec 23, 2004 at 08:39 UTC
    1. TMTOWTDI
    2. I love regular expressions
    use Data::Dumper; my %hash; my $sub_length = '.' x 2; while (<DATA>) { chomp; # remove the chomp. It's unnecessary $hash{$.-1}=[ grep /$sub_length/o, split /($sub_length)/o ]; } print Dumper \%hash; __DATA__ ABDC EFGH
    Update: I just noticed, that my first version fails if the real line length is not a multiple of the wanted length. So I had to put in the grep-regex :-(

    Update 2: I alos like obfuscation. So here we go:
    use Data::Dumper; my %hash; my $sub_length = '.' x 2; $hash{$.-1}=[ grep /$sub_length/o, split /($sub_length)/o ] while <DAT +A>; print Dumper \%hash; __DATA__ ABDCX EFGHX

    $\=~s;s*.*;q^|D9JYJ^^qq^\//\\\///^;ex;print
Re: Getting columnwise substring from multiple lines
by sasikumar (Monk) on Dec 23, 2004 at 06:48 UTC
    Hi
    To Make it same way as that of your code looks
    #!/usr/bin/perl -w use strict; use Data::Dumper; my $enum_size =0; my %hash; my $sub_length = 2; my $lmer; my $i=0; open (DATA,"<C:\\temp.txt"); my @temp=<DATA>; chomp $temp[$i]; $enum_size = length ($temp[$i]) - $sub_length +1; for (my $j =0 ;$j <$enum_size ;$j++) { my @array; for($i=0;$i<@temp;$i++){ chomp $temp[$i]; $lmer = substr ($temp[$i], $j, $sub_length); push @array, $lmer; print "Array $j:@array"; print "\n"; } $hash{$j}=[@array]; } print Dumper \%hash;

    Thanks
    Sasi Kumar