in reply to Re^2: Perl script help to convert .txt file to .csv
in thread Perl script help to convert .txt file to .csv

You are correct, this needs to be run often and was hoping to use this code as a template for future applications. I am appreciative of your reply, but I am still new to one-liners.

Then you could cut&paste the Deparse output and use it as the basis of your own code.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

  • Comment on Re^3: Perl script help to convert .txt file to .csv

Replies are listed 'Best First'.
Re^4: Perl script help to convert .txt file to .csv
by Seabass (Novice) on Dec 20, 2011 at 19:04 UTC

    Continued from before...Now I'm trying to understand the parsed version

    BEGIN { $/ = ">"; $\ = "\n"; }

    Not sure, but does this statement define the boundaries of each line? Start with a carat and end with a new line.

    LINE: while (defined($_ = <ARGV>)) { chomp $_;

    Setting up the while loop, what does $_ = <ARGV> mean?

    our(@F) = split(' ', $_, 0); next unless @F;

    This sets the array F and splits it at each whitespace, and two other terms I dont know. What does next unless @F mean?

    s/length=// foreach ($F[1]);

    This code clears the length= string from every first column in the array.

    @F[5] = join('', @F[5..99]);

    This joins any columns past 5-99 with column 5

    $#F = 5;

    Don't know? Does this turn the array into a scalar variable set to 5 columns?

    print '>', join(',', @F);

    This is the print statement, printing the carat. Then you join each column of the array by a comma?

      Oh. Where to begin. Okay, I'm going to (try) to respond to both your posts here.

      The command line:perl -n -e" print " yourFile causes perl to read (-n) yourFile one line at a time, and execute the inline script (the -e" ... " bit) for each line, passing the line in the variable: $_.

      $_ is the 'default variable' upon which many of Perl's built-in functions will operate if they are not given any arguments. Eg print

      So, in the above case, all that does is print each line to the terminal. The effect is the same aas type yourFile or cat yourFile depending upon your operating system. Not so useful, but a useful simple example.

      If you add the switch -MO=Deparse to the above command, Perl displays the full script equivalent that it runs on your behalf.

      In the case of the above, when you type the command: perl -n -e" print " yourFile, Perl actually executes:

      c:\test>perl -MO=Deparse -n -e" print " yourFile LINE: while (defined($_ = <ARGV>)) { print $_; } -e syntax OK

      What that tells you is that Perl opens yourFile as the filehandle ARGV, and then reads a line at a time from that file handle and places it into $_. It then invokes the -e code, which is just print, which causes it to be printed to the screen. Where it could be redirected to another file.

      In the case of the one-liner I posted, the output (with my annotations) is :

      ## Set the input separator to be '>'; and the output separator to be " +\n"' BEGIN { $/ = ">"; $\ = "\n"; } ## read each input record (terminated by the '>' character) into $_ LINE: while (defined($_ = <ARGV>)) { ## Remove the record terminator ('>') chomp $_; ## split the record on whitespace into @F our(@F) = split(' ', $_, 0); ## If the record was empty (the first one always will be with your dat +a), ## then skip to the next record. next unless @F; ## remove the 'length=' prefix from the second field s/length=// foreach ($F[1]); ## replace the 6th field in the array, with the concatenation of itsel +f ## and all the fields that follow it @F[5] = join('', @F[5..99]); ## Then discard all the fields that follow it. ## Ie. truncate the array after the 6th field $#F = 5; ## And print out the fields joined with ','s and prefixed with '>' ## (and ending with a newline. See output separator above.) print '>', join(',', @F); }

      Either you'll read perlrun, understand the options used on the one-liner and make sense of all that and it will help you. Or you won't and it was a waste of both our time.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

      BEGIN { $/ = ">"; $\ = "\n"; }
      1. BEGIN: do what's inside the block (curly brackets) before getting involved with other parts of the script.
      2. $/ = ">";set the input record separator -- more or less, "use the greater_than symbol as a marker for the start of a new section to be read
      3. $\ = "\n"; set the output record separator to a newline (UNDERSTAND: these were provided by Deparse and are NOT explicitly set in the original one-liner.

      In other words, more or less what you thought, "define the boundaries" of what your script will eventually see as a single-section's-worth-of-data. But, "<" is the less_than symbol, not a "carat" (nor "carrot" nor even what you probably intended, "caret").

      ... etcetera.

      But most of the answers to your questions/guesses are available at your own terminal (or should be):

      perldoc perlvar (special vars)
      perldoc -f next
      perldoc -f join

      And you'll also find scads of help in the Tutorials section or by using Super Search or Dave Cross's specialized Perl Search (which, thanks to some rather neat hacks, outdoes big G when searching for punctuation-laden expressions).

Re^4: Perl script help to convert .txt file to .csv
by Seabass (Novice) on Dec 20, 2011 at 18:24 UTC

    I see what your saying, but where would I paste and what parts do I change? What does the first line mean?

    I'm trying to break it down so, correct me where I'm wrong.

    c:\test>perl -MO=Deparse -l -0x3e -ane

    Is this the path for my in file?

    "@F||next; s/length=//for$F[1];

    @F is setting an array to F, so $F1 would equal the first column. Then you sub out the length= using the substitutor code.

    +@F[5]=join'', @F[5..99];

    This means join all columns up to 5, and then for column 5 join any additional columns from 5..99.

    $#F=5; print'>',join',',@F" junk.dat

    Not sure on this one, what does the # mean? Then the print statement: print the carat, the join statement, and then the array F? What is junk.dat? Is this where I put the outfile name?

    I want to understand what is going on more than just a copy and paste job. This is getting long, but I'll work through the rest and post again.

Re^4: Perl script help to convert .txt file to .csv
by Seabass (Novice) on Dec 20, 2011 at 23:45 UTC

    Man you are shiz! Even though I'm still shakey with the one liner protocol, your explanations of the code answered so many of my questions! That was very helpful, thanks for your time.