aztec has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I'm a student assistant who has been given the task of converting a text file into a csv file. I have never written code in perl before. Can any one point me in the right direction? As of right now i'm looking into <s///> as my means for accomplishing this task. Basically searching file A for any and all whitespace/tabs/commas then replacing these occurances with commas and saving as file B. any help would be much appreciated. -Dom

Replies are listed 'Best First'.
Re: convert txt file to csv file
by japhy (Canon) on Aug 03, 2005 at 00:06 UTC
    The s/// operator will do what you need, although you could also use the tr/// operator and get the same result. (Consequently, you could forego Perl altogether and just use the tr program.)

    Perl's command-line options will also help you concentrate only on the data transformation -- they can take care of opening files, reading their lines, and printing out the modified lines.

    The boilerplate for your task is as follows:

    perl -p -i.bak -e 's/[XYZ]/,/g' files*
    The -p flag wraps your code with while (<>) { ... } continue { print } and the -i flag backs up the files you read with a ".bak" suffix. The executed code (as I've written it) turns X's, Y's, and Z's into commas. You could also use tr/XYZ/,/.

    I suggest you read perlrun and perlop.


    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: convert txt file to csv file
by graff (Chancellor) on Aug 03, 2005 at 01:53 UTC
    I would second Japhy's suggestion to keep the perl coding simple and let the command line do more for you. But if you want to have a script stored in a file so it's always ready to repeat the operation for you whenever you need it, here's a nice balance point between scripting and command-line usage:
    #!/usr/bin/perl # simple process to convert spaces and tabs to commas while ( <> ) { tr/\t /,/; print; }
    So much for the perl script -- let's suppose you save that in a file called "space2comma.pl"; using the command line to handle all the file i/o for you involves a shell command like this:
    space2comma.pl input.file > output.file
    (I personally like making sure that the output file does not replace the input file, in case I find out that I need to fix the script or change my plan about what it's supposed to do.)

    I hope you're going to be careful about getting to know the input data, and knowing what sorts of conditions the output data needs to meet. Maybe the output is supposed to have a fixed number of comma-separated fields? Maybe if the input contains some commas already, these should be treated as data (which means the fields that contain commas need to be quoted in the csv output)? Maybe if two "fields" are separated by more than one space or tab character, that whole whitespace string should be treated as a single field boundary -- so add a final "s" modifier to the "tr///" (or maybe as multiple boundaries with empty fields between them, so don't use "tr///s")?

    Getting that sort of stuff wrong can be a real drag.

Re: convert txt file to csv file
by Cody Pendant (Prior) on Aug 03, 2005 at 02:00 UTC
    searching file A for any and all whitespace/tabs/commas then replacing these occurances with commas

    That doesn't sound right to me! You're going to replace all the commas with commas?

    What if your file looks like:

    foo[tab]bar[tab]baz foo[tab]bar, or possibly baz[tab]quux

    You need to enclose the second item in the second line in quotes:

    foo,bar,baz foo,"bar, or possibly baz",quux

    or else your data isn't going to be right.



    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print
Re: convert txt file to csv file
by CountZero (Bishop) on Aug 03, 2005 at 09:24 UTC
    A very typical task and one for which Perl is particularly well suited. Congratulations on choosing the right tool for the job.

    We do such tasks on a regular basis (extracting statistical and claims data out of text files and putting them in a CSV-format prior to including the data into a database).

    So from dire experience I can tell you that there are numerous ways that such a seemingly simple task can turn around and bite you.

    The way to solve it is by first analysing the text file and answering the question "How are the various records and fields separated from each other?" Are they delimited, i.e. is there a special (unique?) character that separates one field or record from another, such as a tab-character, a new-line, a space, ... or are the fields put into columns of fixed lengths or are the fields surrounded by some tags (XML-style), ... . Is this format applied consistently through-out the whole file or is the file littered with non-data lines and items, such as headers, page numbers, ...

    Many times there is a combination of various methods, such as records separated by new-line and the fields are of a fixed length.

    Based upon the above analysis you then devise a parsing strategy, e.g. read the file line by line if the records are "new-line" separated and then split the records into its fields based upon the unique separating character (think of using the split function!). If the separating character is not unique (but can also be found in the text of the field itself (commas and spaces!), are they then somehow escaped or quoted? If the fields are fixed-length you will want to look into unpack to get the data.

    Finally, once you have the fields extracted you have to output them in the CSV-file and for that no better solution exist than to use existing CPAN-modules such as Text::CSV::Simple or Text::CSV_XS.

    Something you might als want to look into is DBD::AnyData which gives a standard database-like interface to various types of datafiles.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law