in reply to How to group by a column and calculate max/min on another

Your logic is not too far off. One way to deal with your issue with minor changes to code would be to add a layer of depth to hash of arrays:

#!/usr/bin/perl -w use strict; use List::Util qw(max); use List::Util qw(min); #my $input0 = $ARGV[0]; #open (DATA,$input0) || die "cannot open input0"; my %gene_hash; while(<DATA>) { chomp; my ($chr, $start, $end, $gene, $ex) = split(/\t/, $_); my $gene_key = $chr.":".$gene; push( @{ $gene_hash{$gene_key}{start} }, $start ); push( @{ $gene_hash{$gene_key}{end} }, $end ); push( @{ $gene_hash{$gene_key}{ex} }, $ex ); } foreach my $key (keys %gene_hash) { my ($c, $g) = split(/\:/, $key ); print "$c\t$g\t"; my $Low=min( @ {$gene_hash{$key}{start} } ); my $High=max( @ {$gene_hash{$key}{end} } ); my $High_ex=max( @ {$gene_hash{$key}{ex} } ); { print "$Low\t$High\t$High_ex"; } print "\n"; } __DATA__ chrX 2680092 2744539 XG 1 chrX 2680090 2744529 XG 2 chrX 2680080 2744519 XG 3 chrX 2680070 2744509 XG 4 chrX 2680070 2744509 DT 1 chrX 2680090 2744519 DT 2
If the modification is unclear, you can use Data::Dumper to output the resultant structure by adding the following to the end of your script:

use Data::Dumper; print Dumper \%gene_hash;

A couple of minor things you may consider in addition:

  1. You should probably get into the habit of using 3-argument open instead of 2-argument open; the difference is explained in perlopentut.
  2. You might also consider swapping to Indirect Filehandles. This can become important in larger projects.
  3. split acts on $_ if no argument is given, so you could change that call on line 13 to = split(/\t/);
  4. You delimit your keys with ':'; if you are going to create an amalgam key, you should use a character that is guaranteed not to appear in your file - might I suggest "\t"? That way you don't have to split it again for output.

Replies are listed 'Best First'.
Re^2: How to group by a column and calculate max/min on another
by perl_paduan (Initiate) on Aug 03, 2010 at 14:06 UTC
    Thanks and thanks kennethk!

    I will try to apply all your suggestions to my scripting