Perl with MS Access Weirdness

mlong has asked for the wisdom of the Perl Monks concerning the following question:

I'm not sure how many monks use perl on Win32, but I am
trying to convert data from an Access database to
something that can be used by MySQL on a Linux machine.
The problem I have is that I can't export the data
directly to a CSV because there are carriage returns,
and or line feeds in the data in one of the columns. So I
am using perl to read the data into memory from the
Access database (ODBC) and strip out all of the
offending characters that break the CSV output. Then I
plan to update that column in each record with the
changed data. I am, however, having a terrible time
getting it to work. I'm not sure if it's my perl code or
some other problem.

Here is what I am doing:

while( my @row = $sth->fetchrow_array() )
{
    my $descriz = $row[1];   # IDTitolo
    my $title_id = $row[0];  # Descriz

    $descriz =~ s/\^M//g;
    $descriz =~ s/\n//g;
    $descriz =~ s/^\s+//g;
    $descriz =~ s/\s+$//g;
        print $descriz, "\n";

#    $dbh->do( "UPDATE Titoli SET Descriz = '$descriz' WHERE IDTitolo 
+= '$title_id'" );
}
[download]

For some reason the information on the first line of the
data from the Descriz field gets completely truncated.
Here is what a sample record (the Descriz column) looks like.

RATED R  

In yet another bid to shed her girl-next-door image, Sandra 
Bullock tackles a difficult role in this riveting 
psychological thriller directed by Barbet Schroeder 
(Single White Female), and under his guidance...<truncated 
for brevity>.

Notice specifically the line breaks. This is what seems
to be giving me trouble. But, like I said, after I've
run it through my substitutions, The first line "RATED
R" is gone completely.

Any help would be appreciated.

Thanks.

-Matt

Comment on Perl with MS Access Weirdness Download Code

Replies are listed 'Best First'.
Re: Perl with MS Access Weirdness by Jenda (Abbot) on Jan 28, 2003 at 00:26 UTC
You should use Text::CSV_XS to generate your CSV. No need to strip newlines then (assuming the other side will parse the CSV properly. Using Text::CSV_XS on both ends would seem easiest.) You should get used to using placeholders! Interpolating variables into SQL is dangerous! You should use something like `$upd = $dbh->prepare( 'UPDATE Titoli SET Descriz = ? WHERE IDTitolo = +?' ); while (...) { ... $upd->execute($descriz, $title_id); }` [download] Jenda	[reply] [d/l]
Re: Re: Perl with MS Access Weirdness by Cabrion (Friar) on Jan 28, 2003 at 02:07 UTC
Indeed you need to use Text::CSV, but you also need to set CSV_XS to do binary interpretation. That will take care of embeded CR/LF's and other such things. Beware that some versions of Access will not export more than 65,000 rows as CSV, and there is NO warning. It just silently discards the remaining rows! `use Text::CSV_XS; my $csv = Text::CSV_XS->new(binary=>1); ...` [download] Building SQL through interpolation is fine as long as you escape the values first. Most, if not all, DBD drivers provide a quote function that will escape strings for you. Binding placeholders actually does the quoting for you so most people find that easier. I prefer to escape stuff myself. `$dbh->quote($some_string); my $sql = "update some_table where some_attr = '$some_string'";` [download] Take your pick. Finally a cheap plug: use PostgreSQL	[reply] [d/l] [select]
Re: Re: Re: Perl with MS Access Weirdness by Jenda (Abbot) on Jan 28, 2003 at 12:59 UTC
You are right. I forgot to mention the `binary=>1` If you insist on quoting you might like this: `use Interpolation "'" => sub {"'".$db->quote($_)}; my $sql = "update some_table where some_attr = $'{$some_string}'";` [download] Actually your code should look like this: `$some_string = $dbh->quote($some_string); my $sql = "update some_table where some_attr = '$some_string'";` [download] There are two reasons for using placeholders. First is the quoting, second is speed. If you execute the same SQL command, just with different values, it's much quicker if you prepare the SQL and then just execute it with the values. The ->prepare() allows the SQL server to parse the query/command, prepare the execution plan and so forth. So by using prepare()&execute() you keep the server from having to reparse&recompile the SQL each time. Jenda P.S.: Yes, there are most probably databases that do not really support prepare(). The DBD for those will just remember the SQL, and quote and interpolate variables each time. Then there will be no preformance difference between prepare()&execute() and do().	[reply] [d/l] [select]
Re: Perl with MS Access Weirdness by Mr. Muskrat (Canon) on Jan 27, 2003 at 23:13 UTC
FYI... A proper comma seperated values file, one that has each of the fields enclosed in some type of quote character, can have embedded newlines and carriage returns. i.e. `"blah","this\nis\nfour\nlines","this\nis\nnot","blah","blah"`	[reply] [d/l]
Re: Re: Perl with MS Access Weirdness by mlong (Sexton) on Jan 27, 2003 at 23:38 UTC
My original export didn't maintain quotes around strings. I'll try that again, though. Good point. -Matt	[reply]
Re: Perl with MS Access Weirdness by fokat (Deacon) on Jan 28, 2003 at 05:25 UTC
I don't know why, but I don't like the `s/\^M//g;` bit. Why don't you change the two regexps dealing with line terminators to: `$descriz =~ s/[\r\n]+//g;` [download] Best regards -lem, but some call me fokat	[reply] [d/l]
Re: Re: Perl with MS Access Weirdness by Cody Pendant (Prior) on Jan 28, 2003 at 05:41 UTC
I was just about to say that. No, seriously. If they're \rs and \ns, why are we replacing ^Ms? -- “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D	[reply]
Re: Perl with MS Access Weirdness by Cody Pendant (Prior) on Jan 28, 2003 at 03:29 UTC
I'm really confused by all the answers so far. mlong's got what appears to be a regex problem, and we're all trying to fix his CSV code. Very noble I'm sure, but the fact remains, doesn't it, that if you really have got what he says is in his sample record, and you do what he says are his regexes, then you shouldn't get that result? I mean, am I being dumb here? mlong, what happens if you print out, like: `print "here's the current contents of \$descriz: $descriz";` [download] then check it again afterwards? The consensus seems to be that your data's coming out wrong, or going in wrong, so that would prove it, right? -- “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D	[reply] [d/l]
Re: Re: Perl with MS Access Weirdness by mlong (Sexton) on Jan 28, 2003 at 04:31 UTC
I think that some of he other answers have merit because they provide possible other ways of doing it, but you're right they don't answer the specific question. In answer to your question, the output is very strange. The output looks just like the input except the RATED PG part and all of the line breaks (which appear to be CR/LFs) are gone. Now I do want to remove the line breaks, but need the "RATED PG" part. It gets stored in another part of my database. Meanwhile, though, I've set up a MySQL instance on my windows machine and I am opening a database handle to both Access and My SQL at the same time. This is far more tedious, but I am able to move the values from one database to the other using straight SQL. I can then export from the local MySQL instance and upload that to my Linux server. It's a pain, but I didn't have time to wait for an answer. Thanks for your help. If you have any other suggestions, I think that the input is actually like this if you were to see the actual CR/LFs: RATED PG-\r\n \r\n Former co-stars of the TV sitcom Head of the Class teamed up to write the story and script for this teen comedy. When junior high school student Jason Shepherd (Frankie Muniz) realizes th <...snip...>\r\n \r\n Thanks again. -Matt	[reply]