sdyates has asked for the wisdom of the Perl Monks concerning the following question:

I can't believe I have to post for help on this issues, but after three days, I give up.

Here is a sample of the text I am trying to parse:

Berlinda Girl Newmarket Lord, army; shield 54 + Soccer Berlindis Girl Newmarket Lord, army; shield 32 + Football
Well I want to read a text file wiht this information, then place each tab separated item into a separate part of the array so I can do some work, then dump it out with all the without any extra tabs. Right now I get the text I need, but with a bunch of extra tabs and other informatio I would rather not have.

Here is my code:

ProcessRedFile(); sub ProcessRedFile { open (READ, "dump.txt"); $Counter=(); open (OUT, ">/Users/simondyates/Desktop/OutFile.txt"); @File =<READ>; foreach $File(@File) { @Format = split(/ *\s\s */,$File); push (@Finish, @Format); } } $x=0; print OUT "\t"; $x=0; while ($x<200) { @Test = split(/\t/,$Finish[$x]); push (@Test2, @Test); $x++; } @Finish = @Test2; while ($x<200) { if ($f eq 1) { print OUT "$Finish[$x]\t"; } elsif ( $f eq 2) { print OUT "$Finish[$x]\t"; } elsif ( $f eq 4) { print OUT "$Finish[$x]\t"; } elsif ( $f eq 5) { print OUT "$Finish[$x]\t"; } elsif ( $f eq 8) { print OUT "\n"; $f=0; } $x=$x+1; $f++; } close READ; close OUT;

Replies are listed 'Best First'.
Re: Parsing simple text, made difficult
by liverpole (Monsignor) on May 27, 2007 at 15:04 UTC
    Hi sdyates,

    I've made an attempt to refactor your code somewhat, and have come up with the following:

    # Use strict and warnings -- ALWAYS! use strict; use warnings; # Declare things that might change at the top of the program my $input = "dump.txt"; my $output = "/Users/simondyates/Desktop/OutFile.txt"; # Open the output OUTSIDE of the subroutine my @Finish = ProcessRedFile(); # Check the return value to see if the file was open (OUT, ">", $output) or die "Can't write '$output' ($!)\n"; print OUT "\t"; my @Test2 = ( ); my $x; for ($x = 0; $x < @Finish; $x++) { my @Test = split(/\t/, $Finish[$x]); push @Test2, @Test; } @Finish = @Test2; my $f = 0; while ($x < 200) { # Consolidate -- all branches for $f in {1, 2, 4, 5} did the same +thing! if ($f == 1 or $f == 2 or $f == 4 or $f == 5) { print OUT "$Finish[$x]\t"; } elsif ($f == 8) { print OUT "\n"; $f=0; } $x = $x + 1; $f++; } close OUT; # # Consider choosing a better subroutine name -- I have no # idea what a "Red File" is ... ?? # sub ProcessRedFile { open (READ, $input) or die "Can't read '$input' ($!)\n"; my @File = <READ>; my @lines = ( ); foreach my $File (@File) { # This regex doesn't make much sense -- you're trying # to split on 2 whitespace character with optional leading # or trailing spaces?? Why not just split(/\s\s+/, $file)?? my @Format = split(/ *\s\s */, $File); push @lines, @Format; } close READ; return @lines; }

    Note that it is now strict and warning safe.  However, I'm still not following all the logic, particulary with regards to the hardcoded while ($x < 200) {.

    Part of my confusion may be due to not having a copy of the data file you're using, but you should strive to make your program not rely on hardcoded numbers; rather, have it process just as much data as is available.

    You should try this code and see if it gets you any farther.  It opens the output file outside of the subroutine so that the filehandle stays in scope (and also because a subroutine for processing an input file shouldn't be doing anything with an output file).  I've also put comments in the code at places where I've changed things, or where there's some question as to the functionality.


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
Re: Parsing simple text, made difficult
by derby (Abbot) on May 27, 2007 at 14:54 UTC

    I don't even know where to start - the problems with your write up (no code tags), the problems with your code (not checking return values for open, unnecessary globals), or the problem with your problem - it's poorly defined.

    From what I can gather, you have a tab separated file that you want to print out with the *columns* in a different order - just use Text::CSV_XS.

    -derby
Re: Parsing simple text, made difficult
by GrandFather (Saint) on May 27, 2007 at 20:54 UTC

    Maybe you are after something like:

    use strict; use warnings; my @lines; while (<DATA>) { chomp; my @fields = split /\t/; push @lines, [@fields]; } # At this point @lines contains an array ref for each input line. # Each array ref is the list of field data for the line for (@lines) { # Do some stuff with the data print "$_->[0] is a $_->[2] $_->[1] who enjoys a good game of $_-> +[7]\n"; } print "\n"; # Now output the fields in a different order: for (@lines) { print "\t", join ("\t", @{$_}[0, 1, 3, 4, 7]), "\n"; } __DATA__ Berlinda Girl Newmarket Lord, army; shield 54 Soc +cer Berlindis Girl Newmarket Lord, army; shield 32 Fo +otball

    Prints:

    Berlinda is a Newmarket Girl who enjoys a good game of Soccer Berlindis is a Newmarket Girl who enjoys a good game of Football Berlinda Girl Lord, army; Soccer Berlindis Girl Lord, army; Football

    Note that for posting questions it is much better if you provide sample code that can be run and that relying on external files makes that harder. See I know what I mean. Why don't you?.


    DWIM is Perl's answer to Gödel
Re: Parsing simple text, made difficult
by varian (Chaplain) on May 27, 2007 at 16:05 UTC
    To allow others to help you better please use code tags around your program code, the data and the output, to make your post more readable. Also it seems that the code that you posted lost a couple of square brackets
    e.g.: $finish$x should probably read as:  $finish[$x]

    Even more important: add "use strict;use warnings;" in your code so that Perl itself can help you! If you switch on warnings you will find a wealth of useful information to help you structure your program.

    Your program as is:

    • parses only the initial 25 lines (= 200 fields), remaining lines are ignored. Is that by intention?
    • adds a tab as the last character before the newline
      you probably want to remove \t at the line with: elsif( $f eq 8) { print OUT "\n"; $f=0;}
    What 'other information' did you get that you did not want?

    Update:After posting I noticed that Liverpole has provided an excellent answer already