mlong has asked for the wisdom of the Perl Monks concerning the following question:

I'm not sure how many monks use perl on Win32, but I am
trying to convert data from an Access database to
something that can be used by MySQL on a Linux machine.
The problem I have is that I can't export the data
directly to a CSV because there are carriage returns,
and or line feeds in the data in one of the columns. So I
am using perl to read the data into memory from the
Access database (ODBC) and strip out all of the
offending characters that break the CSV output. Then I
plan to update that column in each record with the
changed data. I am, however, having a terrible time
getting it to work. I'm not sure if it's my perl code or
some other problem.

Here is what I am doing:

while( my @row = $sth->fetchrow_array() ) { my $descriz = $row[1]; # IDTitolo my $title_id = $row[0]; # Descriz $descriz =~ s/\^M//g; $descriz =~ s/\n//g; $descriz =~ s/^\s+//g; $descriz =~ s/\s+$//g; print $descriz, "\n"; # $dbh->do( "UPDATE Titoli SET Descriz = '$descriz' WHERE IDTitolo += '$title_id'" ); }

For some reason the information on the first line of the
data from the Descriz field gets completely truncated.
Here is what a sample record (the Descriz column) looks like.

RATED R  

In yet another bid to shed her girl-next-door image, Sandra 
Bullock tackles a difficult role in this riveting
psychological thriller directed by Barbet Schroeder
(Single White Female), and under his guidance...<truncated
for brevity>.

Notice specifically the line breaks. This is what seems
to be giving me trouble. But, like I said, after I've
run it through my substitutions, The first line "RATED
R" is gone completely.

Any help would be appreciated.

Thanks.

-Matt

Replies are listed 'Best First'.
Re: Perl with MS Access Weirdness
by Jenda (Abbot) on Jan 28, 2003 at 00:26 UTC
    1. You should use Text::CSV_XS to generate your CSV. No need to strip newlines then (assuming the other side will parse the CSV properly. Using Text::CSV_XS on both ends would seem easiest.)
    2. You should get used to using placeholders! Interpolating variables into SQL is dangerous! You should use something like
      $upd = $dbh->prepare( 'UPDATE Titoli SET Descriz = ? WHERE IDTitolo = +?' ); while (...) { ... $upd->execute($descriz, $title_id); }

    Jenda

      Indeed you need to use Text::CSV, but you also need to set CSV_XS to do binary interpretation. That will take care of embeded CR/LF's and other such things. Beware that some versions of Access will not export more than 65,000 rows as CSV, and there is NO warning. It just silently discards the remaining rows!
      use Text::CSV_XS; my $csv = Text::CSV_XS->new(binary=>1); ...
      Building SQL through interpolation is fine as long as you escape the values first. Most, if not all, DBD drivers provide a quote function that will escape strings for you. Binding placeholders actually does the quoting for you so most people find that easier. I prefer to escape stuff myself.
      $dbh->quote($some_string); my $sql = "update some_table where some_attr = '$some_string'";
      Take your pick.

      Finally a cheap plug: use PostgreSQL

        You are right. I forgot to mention the binary=>1

        If you insist on quoting you might like this:

        use Interpolation "'" => sub {"'".$db->quote($_)}; my $sql = "update some_table where some_attr = $'{$some_string}'";
        Actually your code should look like this:
        $some_string = $dbh->quote($some_string); my $sql = "update some_table where some_attr = '$some_string'";

        There are two reasons for using placeholders. First is the quoting, second is speed. If you execute the same SQL command, just with different values, it's much quicker if you prepare the SQL and then just execute it with the values. The ->prepare() allows the SQL server to parse the query/command, prepare the execution plan and so forth. So by using prepare()&execute() you keep the server from having to reparse&recompile the SQL each time.

        Jenda

        P.S.: Yes, there are most probably databases that do not really support prepare(). The DBD for those will just remember the SQL, and quote and interpolate variables each time. Then there will be no preformance difference between prepare()&execute() and do().

Re: Perl with MS Access Weirdness
by Mr. Muskrat (Canon) on Jan 27, 2003 at 23:13 UTC

    FYI... A proper comma seperated values file, one that has each of the fields enclosed in some type of quote character, can have embedded newlines and carriage returns.

    i.e. "blah","this\nis\nfour\nlines","this\nis\nnot","blah","blah"

      My original export didn't maintain quotes around strings. I'll try that again, though. Good point.

      -Matt
Re: Perl with MS Access Weirdness
by fokat (Deacon) on Jan 28, 2003 at 05:25 UTC

    I don't know why, but I don't like the s/\^M//g; bit. Why don't you change the two regexps dealing with line terminators to:

       $descriz =~ s/[\r\n]+//g;

    Best regards

    -lem, but some call me fokat

      I was just about to say that.

      No, seriously.

      If they're \rs and \ns, why are we replacing ^Ms?
      --
      “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D

Re: Perl with MS Access Weirdness
by Cody Pendant (Prior) on Jan 28, 2003 at 03:29 UTC
    I'm really confused by all the answers so far.

    mlong's got what appears to be a regex problem, and we're all trying to fix his CSV code.

    Very noble I'm sure, but the fact remains, doesn't it, that if you really have got what he says is in his sample record, and you do what he says are his regexes, then you shouldn't get that result? I mean, am I being dumb here?

    mlong, what happens if you print out, like:

    print "here's the current contents of \$descriz: $descriz";
    then check it again afterwards?

    The consensus seems to be that your data's coming out wrong, or going in wrong, so that would prove it, right?
    --
    “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D

      I think that some of he other answers have merit because
      they provide possible other ways of doing it, but you're
      right they don't answer the specific question.

      In answer to your question, the output is very strange.
      The output looks just like the input except the RATED PG
      part and all of the line breaks (which appear to be
      CR/LFs) are gone. Now I do want to remove the line
      breaks, but need the "RATED PG" part. It gets stored in
      another part of my database.

      Meanwhile, though, I've set up a MySQL instance on my
      windows machine and I am opening a database handle to
      both Access and My SQL at the same time. This is far
      more tedious, but I am able to move the values from one
      database to the other using straight SQL. I can then
      export from the local MySQL instance and upload that to
      my Linux server. It's a pain, but I didn't have time to
      wait for an answer.

      Thanks for your help. If you have any other suggestions,
      I think that the input is actually like this if you were
      to see the actual CR/LFs:

      RATED PG-\r\n
      \r\n
      Former co-stars of the TV sitcom Head of the Class 
      teamed up to write the story and script for this teen
      comedy. When junior high school student Jason Shepherd
      (Frankie Muniz) realizes th <...snip...>\r\n \r\n

      Thanks again.

      -Matt