ellem has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to read in a file (.txt, .csv, or Tab Delimited) and quote every word that doesn's already have them. An example of the text in question is:

"Smith, Robert",94 N. Orange Grove,# 25,West Hollywood,CA,90046,(323)555-1234,931

Some of the issue is that I don't know how the file will look so I guess I would need to convert all files to CSV (so if there are TABS I'd need to convert them to COMMAS.)

What would I do to determine the delimeter, and how would I pull each word out to quote it?

Or where should I be looking for the answers to these questions?
--
lmoran@wtsgSPAM.com
There's More Than One Way To Do It...just not usually my way!

Replies are listed 'Best First'.
(jeffa) Re: Quoting words
by jeffa (Bishop) on Oct 22, 2001 at 23:18 UTC
    First off, use a CPAN module to parse the delimited files, be it Text::CSV, Text::CSV_XS, or tilly's utra versatile Text::XSV.

    Quoted every word that is not already quoted is rather tough problem. Here is a trivial solution:

    $word = qq("$word") if $word !~ /"/;
    but this only handle words, no digits. Also, what if there is one quote present? or something like this:
    foo,hello""world,bar
    Gets rather sticky, which brings me to your next question, How to determine the delimiter being used? Well, there is no solution for that - partial solutions, maybe - but i would use my own eyes for that solution.

    Ever hear the one about about the $600 pen delevoped for space. It was painstakenly and expensively designed to work in zero gravity and extreme temperatures. The Russians used a pencil! :D

    Lesson here, some things are quicker done by brute force.

    Last question, how to pull out each word? Read the docs for one the CPAN modules mentioned above. There have been LOTS of questions asked on this site about parsing CSV files. Try super search on this site. i won't show you code, because i would just be copying the docs, but i will explain the process.

    Basically, you open the file and read it one line at a time or slurp it into an array and process the lines that way (hint: while loop). Then you pass each line to a method provided by one of the CPAN modules mentioned above, and you get back a list of parsed scalars. You process those one at a time (hint: for loop) and determine if you need to add quotes or not (hint: regex).

    What to do from there you never specfied, i would imagine the most usefull thing to do would be to save it back to another file.

    jeffa

Re: Quoting words
by cLive ;-) (Prior) on Oct 22, 2001 at 23:11 UTC
    I'd use DBD::CSV - only because it doesn't look like Text::CSV lets you set the delimiter.

    To make it work, you'd need to prepend a line to the csv files of the form:

    0,1,2,3,4,5,6,7,8,9,10

    where "," is replaced by the relevant delimiter, then do a SELECT * FROM datafile and dump the results in quotes as needed.

    .02

    cLive ;-)

    ps - yes, the answer's deliberately fuzzy since I don't have time to double check my thoughts - perhaps someone else has a cleaner response?

Re: Quoting words
by Rich36 (Chaplain) on Oct 22, 2001 at 23:16 UTC
    Here's one way to do it... This is a little fragile given that the fields "last name, first name" and "city, state" need to be in the same place in the list of elements in the line. If there's a comma in the data in the fields between the name and "city, state" fields, the code won't work. If you know that those fields aren't going to contain commas, you should be fine.
    #!/usr/bin/perl -w use strict; my $file = "question.lst"; my $newfile = "question2.lst"; my @newlines; open(FILE, "<$file") || die "$!\n"; chomp(my @lines = <FILE>); close FILE; foreach(@lines) { s/\t/,/g; # substitute commas for any tabs # This keeps the field delimiters all the same # If your data is going to have tabs in it, # remove that line my @elements = split(/,/, $_); # since the split splits the last and first name, # and the "City, State" fields # join them back together and remove the unnecessary # elements $elements[0] = "$elements[0],$elements[1]"; $elements[4] = "$elements[4], $elements[5]"; splice(@elements, 1, 1); splice(@elements, 4, 1); # If the elements aren't already enclosed in quotes, # enclose them. foreach(@elements) { $_ = qq("$_") unless (m/\".*\"/); } # Join the elements of the line back together # and push them to a new array my $newline = join(",", @elements); push(@newlines, $newline); } # Print the information to a new file open(NEWFILE, ">$newfile") || die "$!\n"; foreach(@newlines) {print NEWFILE "$_\n";} close NEWFILE;

    Here's the input file (question.lst)...

    "Smith, Robert",94 N. Orange Grove,# 25,West Hollywood,CA,90046,(323)5 +55-1234,931 "Jones, Bob" 111 S. Orange Grove # 72 Silverlake,CA 90210, +(323)555-5555,931

    Here's the output file (question2.lst)...

    "Smith, Robert","94 N. Orange Grove","# 25","West Hollywood, CA","9004 +6","(323)555-1234","931" "Jones, Bob","111 S. Orange Grove","# 72","Silverlake, CA","90210","(3 +23)555-5555","931"

    Hope that helps...

Re: Quoting words
by cfreak (Chaplain) on Oct 22, 2001 at 23:36 UTC
    If you don't know what type of file it is (commas or tabs) then you may have a problem because if you convert all the commas to tabs in a file you could end up converting legitimate commas to tabs. I'm not certain if the reverse would be true. (but I would assume so). But the conversion is pretty easy:
    open(FILE,"+<tabs.txt") or die "Couldn't open tabs.txt: $!"; my @file = <FILE>; seek(FILE,0,0); foreach(@file) { s/\t/,/g; print FILE $_; } close(FILE);

    Now for putting quotes around your values:

    open(FILE,"+<tabs.txt") or die "Couldn't open tabs.txt: $!"; my @file = <FILE>; seek(FILE,0,0); foreach(@file) { chomp(); my $line = ""; my(@values) = split(/\,/,$_); foreach my $value(@values) { $value = "\"$value\"" if $value !~ /^\".*?\"$/; $line .= "$value,"; } chop($line); print FILE "$line\n"; } close(FILE);

    Hope that helps