Re: Re-organising entries

Since some of your records contain spaces, a simple call to split with no regex doesn'T do any good. Also you need to special-case the number in the first column, since you want to repeat it.

Here's my approach (reading from DATA instead of a file handle for convenience):

use strict;
use warnings;

while (<DATA>) {
    chomp;
    my ($id, @records) = split /\t|(?<=\)),\s+/, $_;
    my (@left, @right);
    for my $r (@records)  {
        if ($r =~ /^SP_/) {
            push @left, $r;
        } else {
            push @right, $r;
        }
    }

    while (@left || @right) {
        print $id, "\t", (shift(@left) || ' - '),
                   ', ', (shift(@right) || ' - '),
                   "\n";

    }
}

__DATA__
1    SP_85(IS33, qqq), SP_155(IS33eee)    spr_111(ISyyy33, qqq), spr_1
+71(IS33eee)
2    SP_83(S3 , jgjg), SP_32(IS33, jhdjdjd)    spr_113(Stty3 , jgjg), 
+spr_1881(IS33, jhdjdjd)
3    SP_78(3jmdsjkdej), SP_66(IShbdhdhd33)    spr_115(3jmhhggggdsjkdej
+), spr_1551(IShbdhdjjjhd33), spr_88881(Iyt33ff), spr_145411(Iddd3ff)
4    SP_77(3jmdsjkdej), SP_1485(Idhd33ff)    spr_116(3jmdhhhhhsjkdej),
+ spr_17781(Idhdhhtytyt33ff)
[download]

Perl 6 - second systems done right

Comment on Re: Re-organising entries Download Code

Replies are listed 'Best First'.
Re^2: Re-organising entries by $new_guy (Acolyte) on Feb 14, 2011 at 15:12 UTC
Hi Moritz and Limbic_Region, Am afraid your scripts don't work very well. They only generate 2 coulumns and the contents/entries in the columns don't have the same prefix. If you look at my scratchpad I have put the first seven rows (of which there are about 8000 rows in my .txt file). Note that each row starts with Cluster(\d+) i.e. the word "Cluster" followed by a number eg 1,2,3 etc. The code I have so far come up with is: #!usr/bin/perl -w use warnings; use strict; use List::Util 'max'; # Read in the file my $FILENAME3 = "clusters3.txt"; open(DATA, $FILENAME3); #create arrays and hashes to store stuff my (%data, %all, @keys); while (<DATA>) { # avoid \n on last field chomp; #split the data into chunks my @chunks = split; #create keys for the chunks my $key = shift @chunks; #store the keys in an array unless they already exist push @keys, $key unless exists $data{$key}; foreach my $chunk (@chunks) { #return references using hashes $data{$key}{$chunk}++; #add all chunks to the hash '%all' $all{$chunk} = 1; } #now make a file for the ouput my $outputfile = "new_cluster.txt"; if (! open(POS, ">>$outputfile") ) { print "Cannot open file \"$outputfile\" to write to!!\n\n" +; exit; } #sort the fields/columns keys and save them as an array #my @fields = sort {$a <=> $b} keys %all; my @fields = sort {lc($a) cmp lc($b)} keys %all; ##<--this sorting did +n't work #find the longest entry in an array #my $longest = max map {length} @fields; my $longest = max map {scalar grep $_=~ /\(\d+\)\_\(\d+\)\_\(\d+\)\_/, + @fields} @fields; #the line I think has a problem! #organise the data foreach my $key (@keys) { while (keys %{$data{$key}}) { print POS $key, " "; foreach my $field (@fields) { if ($data{$key}{$field}){ printf POS "%${longest}s ", $field; delete $data{$key}{$field} unless --$data{$key}{$field +}; } else { printf POS "%${longest}s ", "-"; } } print POS"\n"; }}} [download] In the code cluster3.txt is my .txt file But it spits out rubbish Is it possible to have for each entry in each row arranged tidyly in columns Generally the prefixes are separated by an underscore for this beginning with letter, except 'spr, HMPREF, and pseudoSPN23F(which is also exactly similar or should be in the same column as SPN23F)' For this beginning with digits/numbers. The prefix is from the beginning to the last underscore e.g. 3850_1_7_ and 3850_1_8_ . Thanks $new_guy	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Re-organising entries
by $new_guy (Acolyte) on Feb 14, 2011 at 15:12 UTC

Hi Moritz and Limbic_Region,

Am afraid your scripts don't work very well. They only generate 2 coulumns and the contents/entries in the columns don't have the same prefix. If you look at my scratchpad I have put the first seven rows (of which there are about 8000 rows in my .txt file). Note that each row starts with Cluster(\d+) i.e. the word "Cluster" followed by a number eg 1,2,3 etc.

The code I have so far come up with is:

#!usr/bin/perl -w

use warnings;
use strict;
use List::Util 'max';

# Read in the file
my $FILENAME3 = "clusters3.txt";
open(DATA, $FILENAME3);

#create arrays and hashes to store stuff
my (%data, %all, @keys);
while (<DATA>) {
# avoid \n on last field
    chomp;
#split the data into chunks
    my @chunks = split;
#create keys for the chunks
    my $key = shift @chunks;
#store the keys in an array unless they already exist
    push @keys, $key unless exists $data{$key};
    foreach my $chunk (@chunks) {
#return references using hashes
        $data{$key}{$chunk}++;
#add all chunks to the hash '%all'
        $all{$chunk} = 1;
    }

#now make a file for the ouput
        my $outputfile = "new_cluster.txt";
           if (! open(POS, ">>$outputfile") ) {
            print "Cannot open file \"$outputfile\" to write to!!\n\n"
+;
                exit;
        }

#sort the fields/columns keys and save them as an array
#my @fields = sort {$a <=> $b} keys %all;
my @fields = sort {lc($a) cmp lc($b)} keys %all; ##<--this sorting did
+n't work

#find the longest entry in an array
#my $longest = max map {length} @fields;
my $longest = max map {scalar grep $_=~ /\(\d+\)\_\(\d+\)\_\(\d+\)\_/,
+ @fields} @fields; #the line I think has a problem!

#organise the data
foreach my $key (@keys) {
    while (keys %{$data{$key}}) {
        print POS $key, " "; 
        foreach my $field (@fields) {                
            if ($data{$key}{$field}){
                printf POS "%${longest}s ", $field;
                delete $data{$key}{$field} unless --$data{$key}{$field
+};
            }
            else {
               printf POS "%${longest}s ", "-";
            }
        }
        print POS"\n";
    }}}
[download]

In the code cluster3.txt is my .txt file But it spits out rubbish

Is it possible to have for each entry in each row arranged tidyly in columns

Generally the prefixes are separated by an underscore for this beginning with letter, except 'spr, HMPREF, and pseudoSPN23F(which is also exactly similar or should be in the same column as SPN23F)'

For this beginning with digits/numbers. The prefix is from the beginning to the last underscore e.g. 3850_1_7_ and 3850_1_8_ .

Thanks

$new_guy

[reply]
[d/l]