Re: tab delimited extraction, formatting the output

First and foremost, you'll probably make your life easier in the long run if you open your code with use strict;use warnings - it'll catch a host of accidental mistakes. If you are working with tab-delimited fields, it's probably easier to use an existing parser like Text::CSV than rolling your own. Since the end-of-record markers are much rarer than tabs and new lines, you could catch that up front. And if you are concerned with formatting, it's probably easier to use sprintf to get things looking nice. Here's some basic code that (mostly) replicates what you've done, to help you with the CSV prototype:

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;


my $file = "fielded.txt";

my $csv = Text::CSV->new({sep_char => "\t"});     # create a new objec
+t
open my $fh, "<", $file or die "Unable to open $file: $!";

while (my $data_ref = $csv->getline($fh)) {
    my @data = @{$data_ref};
    if ($data[0] eq "'EOU'.") {
        # End of record code
    } elsif ($data[2] eq "u") {
        print "\n$data[3]"
    } elsif ($data[2] eq "p") {
        print "$data[3]\n"
    } else {
        #die "Unexpected line format encountered, $file, @data";
    }
}
close $fh;
[download]

Update: I should point out you've made great strides since block extraction - congratulations.

Comment on Re: tab delimited extraction, formatting the output Select or Download Code

Replies are listed 'Best First'.
Re^2: tab delimited extraction, formatting the output by zzgulu (Novice) on Feb 09, 2009 at 21:21 UTC
Thank you kenneth for your great comment and direction. Apprentaly I didn't have CSV package so I learned how to download and install packages too. I am still looking into your script to understand the logic. In the mean time, I added this line at the end but it messed up the output that I was quite happy with that. `elsif ($data[2] eq "mc") { print join("\t",@data[7,8,9,10,11])` [download] how can I make the output of join to appear exctly in front of its related phrase (p). I also looked at sprintf link you sent, couldn't find anything relevant to what I want to do. Thanks again	[reply] [d/l]
Re^3: tab delimited extraction, formatting the output by kennethk (Abbot) on Feb 09, 2009 at 22:04 UTC
Note that in order to get the formatting, you need to cache the previous string in order to determine the indentation. #!/usr/bin/perl use strict; use warnings; use Text::CSV; #my $file = "fielded.txt"; my $csv = Text::CSV->new({sep_char => "\t"}); # create a new objec +t open my $fh, "<", $file or die "Unable to open $file: $!"; my($u_value, $p_value, $mc_value) = (undef) x 3; while (my $data_ref = $csv->getline($fh)) { my @data = @{$data_ref}; if ($data[0] eq "'EOU'.") { ($u_value, $p_value, $mc_value) = (undef) x 3; print "\n"; } elsif ($data[2] eq "u") { $u_value = $data[3]; print "\n$u_value"; undef $p_value; } elsif ($data[2] eq "p") { if ($p_value) { print "\n" . ' ' x length $u_value; } $p_value = $data[3]; print "\t$p_value"; undef $mc_value; } elsif ($data[2] eq "mc") { if ($mc_value) { print "\n" . ' ' x length $p_value; } $mc_value = join("\t",@data[7 .. 11]); print "\t$mc_value"; } else { #die "Unexpected line format encountered, $file, @data"; } } close $fh; [download]	[reply] [d/l]
Re^4: tab delimited extraction, formatting the output by Anonymous Monk on Feb 09, 2009 at 23:20 UTC
Thank you very much Kenneth. I was looking for that word in perl language (cash) too. I read that "undef" tells perl to remove all formatting (like \n) indicators from the text. So, what does this line do? my($u_value, $p_value, $mc_value) = (undef) x 3 thanks again for your time and great great help	[reply]
Re^5: tab delimited extraction, formatting the output by kennethk (Abbot) on Feb 09, 2009 at 23:28 UTC
Re^6: tab delimited extraction, formatting the output by zzgulu (Novice) on Feb 12, 2009 at 15:47 UTC
Some notes below your chosen depth have not been shown here