powerhouse has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to upload a customers files to his new Dedicated server, and I have some tab delimited files which contain some "characters" which I need to filter, or strip out, because they are causing mysql errors when I try to insert them into the database...

Such as this....
"something,"
The "'s and the , is causing a MySQL error.

In a few I'm getting a error because of a \ in the "field".

So I'm trying to "filter" those out of every "field".

I'm doing it like this, with no success. I guess I don't remember the correct syntax, so can you tell me the correct way to do it? Current syntax:
if ($_e_if =~ /(\/|\"|\,)/) { $_e_if = s|(\/|\"|\,)||g; # Get rid of the / and/or " and/or , } if ($_e_if =~ /\\/) { $_e_if = s|\\||g; # Get rid of the \ }
Doing the first one, I'm getting this error:
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE

Is there a way to do all that easier, and in one "shot" instead of in several "methods".

Thank you!
Richard

Replies are listed 'Best First'.
Re: regex "filters", stripping characters
by diotalevi (Canon) on Jan 26, 2004 at 05:14 UTC
    Someone else already suggested that you use a different delimiter. That isn't the problem though it "corrects" the issue for the moment. The real problem is that you are aren't using DBI's placeholders. Until you switch you are (most likely) going to be vunerable to a variety of SQL insertion attacks. This is a security issue and you should address this before anything else.

      Thanks for the info. I am aware of that. However, this is on a dedicated server, and is a file I am executing only from shell. It won't run from a browser, and it's not in a public directory. I'm only using it to parse these 15 files, with each containing 100,000 of his customers information(name, address, subscribe date, subscribed from IP, and phone number only). There are 1,500,000 of them, out of 20,000,000. These are customers that bought something from their company in the last 3 years.

      I'm going to delete the file after I'm done with these 15 files. I'll re-upload it when he sends me the rest to upload into the db for him.

      Thanks again for the concern :o)

      thx,
      Richard
        Apart from the security concern, it's much easier, faster and shorter to use placeholders instead of trying to mangle the data into submission yourself. Probably there will be data in one of the files in the next batch which your regex won't catch. Why risk corrupt data or staring at the regex again when there is a perfectly good solution to your problem?

        Arjen

        You could allow traffic from an IP to your mysql box and run it from any shell in the world. As long as it was internal to the network. Unless this is at some I/ASP :)

        Play that funky music white boy..
Re: regex "filters", stripping characters
by Paladin (Vicar) on Jan 26, 2004 at 04:49 UTC
    The problem is in the second line:
    $_e_if = s|(\/|\"|\,)||g; # Get rid of the / and/or " and/or ,
    You are using | for the delimiter, and have a | in the regex. Either use a different delimiter, or better yet, use tr///d to remove single characters from a string. IE:
    $_e_if =~ tr{/",}{}d;
    Note: You are also missing a ~ in the s/// lines. = should be =~.
      Wow, THANKS! I cannot believe I did not see that!

      Thanks. I guess I need to call it a night :o)

      Thank you so very much...

      Richard
Re: regex "filters", stripping characters
by Mr. Muskrat (Canon) on Jan 26, 2004 at 05:12 UTC
    It's sounds like you could stand to use placeholders in your MySQL query or the DBI quote method.
Re: regex "filters", stripping characters
by Vaati (Sexton) on Jan 26, 2004 at 14:37 UTC
    Hmm I would think that $dbh->quote() could take care of those situations for you in one shot.