comment on

Hi Moritz and Limbic_Region,

Am afraid your scripts don't work very well. They only generate 2 coulumns and the contents/entries in the columns don't have the same prefix. If you look at my scratchpad I have put the first seven rows (of which there are about 8000 rows in my .txt file). Note that each row starts with Cluster(\d+) i.e. the word "Cluster" followed by a number eg 1,2,3 etc.

The code I have so far come up with is:

#!usr/bin/perl -w

use warnings;
use strict;
use List::Util 'max';

# Read in the file
my $FILENAME3 = "clusters3.txt";
open(DATA, $FILENAME3);

#create arrays and hashes to store stuff
my (%data, %all, @keys);
while (<DATA>) {
# avoid \n on last field
    chomp;
#split the data into chunks
    my @chunks = split;
#create keys for the chunks
    my $key = shift @chunks;
#store the keys in an array unless they already exist
    push @keys, $key unless exists $data{$key};
    foreach my $chunk (@chunks) {
#return references using hashes
        $data{$key}{$chunk}++;
#add all chunks to the hash '%all'
        $all{$chunk} = 1;
    }

#now make a file for the ouput
        my $outputfile = "new_cluster.txt";
           if (! open(POS, ">>$outputfile") ) {
            print "Cannot open file \"$outputfile\" to write to!!\n\n"
+;
                exit;
        }

#sort the fields/columns keys and save them as an array
#my @fields = sort {$a <=> $b} keys %all;
my @fields = sort {lc($a) cmp lc($b)} keys %all; ##<--this sorting did
+n't work

#find the longest entry in an array
#my $longest = max map {length} @fields;
my $longest = max map {scalar grep $_=~ /\(\d+\)\_\(\d+\)\_\(\d+\)\_/,
+ @fields} @fields; #the line I think has a problem!

#organise the data
foreach my $key (@keys) {
    while (keys %{$data{$key}}) {
        print POS $key, " "; 
        foreach my $field (@fields) {                
            if ($data{$key}{$field}){
                printf POS "%${longest}s ", $field;
                delete $data{$key}{$field} unless --$data{$key}{$field
+};
            }
            else {
               printf POS "%${longest}s ", "-";
            }
        }
        print POS"\n";
    }}}
[download]

In the code cluster3.txt is my .txt file But it spits out rubbish

Is it possible to have for each entry in each row arranged tidyly in columns

Generally the prefixes are separated by an underscore for this beginning with letter, except 'spr, HMPREF, and pseudoSPN23F(which is also exactly similar or should be in the same column as SPN23F)'

For this beginning with digits/numbers. The prefix is from the beginning to the last underscore e.g. 3850_1_7_ and 3850_1_8_ .

Thanks

$new_guy

In reply to Re^2: Re-organising entries by $new_guy
in thread Re-organising entries by $new_guy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.