Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

HI All, I am a new bee to perl. I think it shold be a simple task.But even in thatI am getting some problems :( I have a tab delimited file similar to this:
Individual 301 302 303 304 2003 a b c d a b c d a b c d a b c d 2004 a b c d a b c d a b c d a b c d 2005 a b c d a b c d a b c d a b c d
So, the first row is tab delimited, and other rows following will have four columns(separated by a space) corresponding to a column in the first row separated by a tab. What I neet to do is to get ta out pus similar to this:
Individual 301A 301B 302A 302B 303A 303B 304A 304B 2003 c d c d c d c d and similary in the other ones as well
I have split the file by new lines and have got almost solved with the rest. But thats in a very crude way. Now I have got a tab at the end of each line.below is my code. How could I get rid of my end tab. And also I have got an extra tab inserted in my second row after the first column(in all the lines after that).
#!/usr/local/bin/perl use strict; use Getopt::Long; my ($sfile, $lfile, $ofile); my (@sampleids); my($loga, $logb, $x, $y) &GetOptions("sfile=s" =>\$sfile, ); unless(open SFILE , $sfile ) {die "cannot open small file: $!\n"}; print "Individual\t"; while(<SFILE>){ chomp; my $line = $_; my @columns = split(/\n/, $line); foreach my $col (@columns) { my @array = split (/\t/, $col); foreach my $a (@array) { next if $a =~ /^Individual/; if ($a =~ /^000\d+/){ print "$a" . "A", "\t", "$a" . "B", "\t"; } } print "\n"; foreach my $b (@array) { next if $b =~ /^Individual/; next if $b =~ /^000\d+/; if($b =~ /^\w+/){ #$b =~ s/^\t//; # $b =~ s/\t$//; print "$b"; } } foreach my $c (@array) { next if $b =~ /^Individual/; next if $b =~ /^000\d+/; next if $b =~ /^\w+/ ; ($loga, $logb, $x, $y) = split (/ /,$c); #$x =~ s/^\t+//; #$x =~s/\t$//; print "$x\t$y\t"; } } }
Any suggestion or comments??? Thanks everyone!!!

Replies are listed 'Best First'.
Re: rearranging the file
by jethro (Monsignor) on Jun 12, 2009 at 12:44 UTC

    A few points:

    1) while (<SFILE>) reads in the file line by line. Since you chomp the line, even the \n at the end of the line is gone. So there is no \n left to split on in your first foreach-loop. That loop can be removed without any consequence to the result

    2) You split on \t</p>. That is maybe correct but will fail if there are spaces too. If your source file is machine generated you might be able to guarantee that condition, if the file is edited by hand, splitting on <c>/\s+/ will make more sense. It will split on any combination of multiple tabs and spaces

    3) If you have unwanted tabs and spaces at the end of the lines of the input file, use  $line=~s/\s+$//;. If you don't want a final tab in your output because of your print "$x\t$y\t"; then you might use the following instead:

    my @result; foreach my $c (@array) { next if $b =~ /^Individual/; next if $b =~ /^000\d+/; next if $b =~ /^\w+/ ; ($loga, $logb, $x, $y) = split (/ /,$c); push @result,$x,$y); } print join("\t",@result);

    4) You seem to check that if a number has 3 digits then it must be the number of an individual. And you used the wrong regex for it. if ($a =~ /^\d\d\d$/){ should work, but if any other data ever has 3 digits you would have a problem. As an alternative you could check each line if it begins with the string 'Individual' and depending on that go into different loops. Your pogram would have the following structure:

    while (<SFILE>) { ... if ($line=~/^Individual) { my @array= split (/\s+/, $line); shift @array; #removes the 'Individual' string foreach (@array) { ... #process an 'Individual' line } } else { my @arrays= split (/\s+/, $line); my $year= shift @array; foreach (@array) ... #process a data line } }

    And a final general hint, if it happens that you don't know what your program is doing, insert meaningful print-lines, for example print "Starting loop 2, \$a=<$a>\n";

Re: rearranging the file
by Timjc86 (Novice) on Jun 12, 2009 at 13:51 UTC
    I have split the file by new lines and have got almost solved with the rest. But thats in a very crude way. Now I have got a tab at the end of each line.below is my code. How could I get rid of my end tab.

    This will be a rather incomplete response, as I'm new to Perl as well, but hopefully this may help you out a bit.

    A very useful function is chop() (similar to chomp(), but slightly different). Chop will remove the last character of a string, no matter what it is (*** so be careful ***). While it may be a bit tedious to run chop() on every single line (depending on how many you have), if you split the text into an array you can do chop(@array) to remove the last character of every element in @array. Not only is this potentially much less tedious, but it's also scalable, should you wish to run this script on inputs of different sizes.

    So from your quoted description, you would want to:

    1. Split the file into an array using the newline character as the delimiter.
    2. Use chop() on the array.

    I am not sure exactly how that fits into your entire program; considering jethro's much more thorough reply, it looks like there are some steps you could remove, so I'm not sure exactly where this may fit into your code after the changes. But it's at least another idea; good luck! ^_^