brozzis has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

I've a simple problem I can't solve by myself, I apologize for that.

I want to clean up a mysql database, derived by ms access, with spaces where they shouldn't be, parsing/reading a mysqldump and converting spaces into underscores...

for example:

create table `table name`( `field name1` type1 not null default 'x', `field name2` type2 default null, ... primary key (`pk name`), key `key name` (`field name1`, `field name2`) ) engine=other things here;
I would like this converted to:
create table `table_name`( `field_name1` type1 not null default 'x', `field_name2` type2 default null, ... primary key (`pk_name`), key `key_name` (`field_name1`, `field_name2`) ) engine=other things here;
this means that everything in the format of:
m/`\w+\s+\w+[\s+\w+]+`/ # backticks

should be transformed into a string with spaces converted to underscores (another option could be the camel case: something like `field name` into: `fieldName`)

I understand there must be a very efficient and compact solution, but I'm losing myself in a sql parsing without success.

Please enlighten my way...

Replies are listed 'Best First'.
Re: Regexp in mysqldump
by moritz (Cardinal) on Apr 15, 2010 at 20:39 UTC
    If you have no other backticks in the dump, you can try something like
    sub space_subst { my $s = shift; $s =~ s/ /_/g; return $s; } s/`(\w+\s+[\w\s]+)`/space_subst($1)/eg;
    Perl 6 - links to (nearly) everything that is Perl 6.
      s/`(\w+\s+[\w\s]+)`/space_subst($1)/eg;

      That gets rid of the backticks, which I think the OPer wants to preserve.

      >perl -wMstrict -le "my $s = q{foo `bar baz` quux}; my $t = $s; $t =~ s{ ` (\w+ \s+ [\w\s]+) ` } { (my $r = $1) =~ s{\s}{_}xmsg; $r }xmsge; print $t; $t = $s; $t =~ s{ (` \w+ \s+ [\w\s]+ `) } { (my $r = $1) =~ s{\s}{_}xmsg; $r }xmsge; print $t; " foo bar_baz quux foo `bar_baz` quux
      You may get into trouble if the default values for fields contain spaces, or if the dump includes database population (insert) statements with values containing spaces.

      Bottom line is that the substitution needs to be SQL syntax aware. I feel too lazy to write a SQL parser - there is probably one already on CPAN.

           Syntactic sugar causes cancer of the semicolon.        --Alan Perlis

        I was creating a parser, and I already created one for another project, but then I discovered that mysqldump uses backticks (`) for table name and field names, and standard ticks (') for values in insert and update statements.

        For this reason a standard regexp does the job.

        I've tried the first solution, and works perfectly at a first glance... I'll try hopefully tonight the second one...
Re: Regexp in mysqldump
by clueless newbie (Curate) on Apr 16, 2010 at 12:47 UTC
    May I offer
    #! use Smart::Comments; use strict; use warnings; my $sql=<<"__SQL__"; /* comment! */ create table `table name`( # another comment! `field name1` type1 not null default 'x has a blank or two', `field name2` type2 default null, # ... primary key (`pk name`), key `key name` (`field name1`, `field name2`) ) engine=other things here; __SQL__ ### $sql my $RegExp_s=qr{(?: (`(?:[^`]|\`)*?`) # backticked string |(?:\/\*.*?\*\/) # /* */ comments |(?:'(?:[^']|\')*?') # single quoted strings |(?:"(?:[^"]|\")*?") # qouble quoted strings |(?m:\#.+?$) # single line # trailing comments )}sx; #' # . includes \n while ($sql =~ m{$RegExp_s}g) { if (defined $-[1]) { # backticked string substr($sql,$-[1],$+[1]-$-[1])=~ s/ /_/g; }; }; ### $sql __END__
    which yields
    ### $sql: '/* comment! */ create table `table name`( # another comment! `field name1` type1 not null default \'x has a blank or two\', `field name2` type2 default null, # ... primary key (`pk name`), key `key name` (`field name1`, `field name2`) ) engine=other things here; ' ### $sql: '/* comment! */ create table `table_name`( # another comment! `field_name1` type1 not null default \'x has a blank or two\', `field_name2` type2 default null, # ... primary key (`pk_name`), key `key_name` (`field_name1`, `field_name2`) ) engine=other things here; '
    Commented the regex.
      Ignore the above! Try the following instead
      #! use Smart::Comments; use strict; use warnings; my $sql=<<"__SQL__"; /* comment! */ create table `table name`( # another comment! `field name1` type1 not null default 'x has a blank or two', `field name2` type2 default null, # ... primary key (`pk name`), key `key name` (`field name1`, `field name2`) ) engine=other things here; __SQL__ my $RegExp_s=qr{( (`(?:[^`]|\`)*?`) # backticked string |(?:\/\*.*?\*\/) # /* */ comments |(?:'(?:[^']|\')*?') # single quoted strings |(?:"(?:[^"]|\")*?") # qouble quoted strings |(?m:\#.+?$) # single line # trailing comments )}sx; #' # . includes \n ### $sql $sql =~ s{$RegExp_s}{defined $2 ? TreatBackTickedString($2) : $1}eg; ### $sql sub TreatBackTickedString { # Jest Fur Phun: return substr($_[0],0,-1).substr($_[0],1); }; __END__