Try this. It appears to split the lines as you want them. See the dumper output produced. I haven't attempted to verify the sort as it's not clear to me what you are attempting to achieve.
Sorting the Array name (number) and the virus name (string) in amongst the data values (Reals) using a numeric sort doesn't make a lot of sense (to me). (If you had warnings enabled, perl would yell at you about that.)
Your indentation leave something to be desired, but that may be an artifact of c&ping the code.
Using $#{ @profile_names } instead of $#$profile_names looks weird also, but perl seems to silently DWYM on that also.
#!/usr/bin/perl # Rank array data in accordance with their P-values # Script author: AR, 2007. use Data::Dumper; my $pvalues_file = $ARGV[0]; my $MAX_TOP_ENTRIES = 20; #open PVAL, "< $pvalues_file" or die "Error: Can't open $pvalues_file: + $!"; my %arrays; my @profile_names; while (my $line = <DATA>) { chomp $line; if ($line =~ /^ARRAYS/) { my $hdr; ( $hdr, @profile_names ) = split( ' ', $line ); } else { ## Split first on the pipes, leaving the values attached to th +e end of the last field my( $array_name, @pvalues ) = split( '\|', $line ); ## Then split the last field on whitespace and overlay the fie +lds you which to discard ## Discarding the unwanted first two fields of the second spli +t at the same time. @pvalues[ 1 .. 6 ] = ( split( '\s+', $pvalues[ 2 ] ) )[ 2 .. 7 + ]; for my $i ( 0 .. $#profile_names ) { $arrays{ $array_name }{ $profile_names[$i] } = $pvalues[$i +]; } } } print Dumper \%arrays; exit; foreach $array (keys %arrays) { my $top_count = 0; print "$array\t"; %pvalues = %{$arrays{$array}}; @profiles_sorted = sort { $pvalues{$a} <=> $pvalues{$b} } ( ke +ys %pvalues ); foreach $key (@profiles_sorted) { $top_count++; if ($top_count <= $MAX_TOP_ENTRIES) { print "$key:$pvalues{$key}\t"; } } print "\n"; } __DATA__ ARRAYS V1.4HFChip-1 V1.4HFChip-10 V1.4HFChip-100 V1.4HFChi +p-11 V1.4HFChip-12 V1.4HFChip-13 37133 | Tula virus | Bunyaviridae | V1.3_110017:22.1 0.539026 0. +357762 0.801409 0.315076 0.207579 0.946322 263532 | Possum enterovirus W6 | Picornaviridae | V1.3_116027:82.1 +0.242743 0.712059 0.474686 0.738211 0.26494 0.529945 271479 | Papaya leaf curl Guandong virus | Geminiviridae | V1.3_105649 +:75.8 0.291412 0.726736 0.277159 0.893388 0.24579 0 +.904211 12202 | Lettuce mosaic virus | Potyviridae | V1.3_118815:65.4 0.391 +46 0.567612 0.771404 0.671439 0.427434 0.855816 116056 | Pelargonium zonate spot virus | Bromoviridae | V1_111931:65.5 + 0.704965 0.750921 0.66365 0.835392 0.654149 0.0426 +2 45709 | Sabia virus | Arenaviridae | V1_112261:16.8 0.392471 0.7 +40175 0.584603 0.861441 0.434677 0.758832 130556 | Culex nigripalpus NPV | Baculoviridae | V1_112047:15.8 0.3 +15955 0.882084 0.551393 0.909915 0.088346 0.745482 312349 | Procyon lotor papillomavirus type 1 | Papillomaviridae | V1.3 +_113827:83.8 0.652409 0.200222 0.65569 0.239118 0.5376 +55 0.889673 243550 | Calicivirus isolate TCG | Caliciviridae | V1.3_115411:78.6 + 0.324359 0.820308 0.238306 0.88163 0.311354 0.741035 150285 | Garlic virus E | Flexiviridae | V1.3_103783:90.0 0.267302 + 0.809609 0.55432 0.908932 0.193653 0.718928
In reply to Re^5: formating data input
by BrowserUk
in thread formating data input
by manu7495
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |