in reply to Re: Masking data in a MySQL dump file. Ideas?
in thread Masking data in a MySQL dump file. Ideas?

Hi Laurent,

Only the SQL statements are delimited from each other by the new line characters. All the rows of data in the INSERT statement look like this:

INSERT INTO sample_table (first_name,last_name) VALUES ('John','Doe'),('John','Smith'),('Foo','O\'Bar'),('Baz','D\'Qux (II)'),('(and so on, ','and so forth);');

No new line character starting from the beginning of the INSERT word and until the very terminating ; character.

The values inside quotes, of course, can contain embedded special characters like apostrophe, comma, opening and closing parentheses, semicolon.

Thank you

Replies are listed 'Best First'.
Re^3: Masking data in a MySQL dump file. Ideas?
by clueless newbie (Curate) on Oct 13, 2016 at 13:22 UTC
    Hi, Doesn't look too bad. First you want something along the lines of
    my $balpar_re=qr{ ( \( # opening ( (?:[^'"()]++ # Not a ', ", ( or ) - no + backtracking # incorrect |(?:'(?:[^']|\')*?') #' a single qu +ote string # incorrect |(?:"(?:[^"]|\")*?") #" a double qu +ote string |(?:'(?:\\'|[^'])*?') #' a single quote string |(?:"(?:\\"|[^"])*?") #" a double quote string |(?-1) # must be a ( or ) so re +curse )*+ # zero or more time - no +backtracking \) # closing ) ) }x;
    so you can get a "(" with its matching ")". Then you want a substitution regex that grabs the contents of each of the sets of balanced parenthesis and feeds them to a function that processes them (one set at a time). Note the above regex does deal with quoted strings.
    #!/usr/bin/env perl use Data::Dumper; use strict; use warnings; my $string=q{INSERT INTO sample_table (first_name,last_name) VALUES (' +John','Doe'),('John','Smith'),('Foo','O\'Bar'),('Baz','D\'Qux (II)'), +('(and so on, ','and so forth);');}; my $balpar_re=qr{ ( \( # opening ( (?:[^'"()]++ # Not a ', ", ( or ) - no + backtracking |(?:'(?:\\'|[^'])*?') #' a single quote string |(?:"(?:\\"|[^"])*?") #" a double quote string |(?-1) # must be a ( or ) so rec +urse )*+ # zero or more time - no +backtracking \) # closing ) ) }x; $string=~ s{$balpar_re}{mung($1)}ge; exit; sub mung { warn Data::Dumper->Dump([\@_],[qw(*_)]),' '; }
    gives
    @_ = ( '(first_name,last_name)' ); at perlmonks.pl line 25. @_ = ( '(\'John\',\'Doe\')' ); at perlmonks.pl line 25. @_ = ( '(\'John\',\'Smith\')' ); at perlmonks.pl line 25. @_ = ( '(\'Foo\',\'O\\\'Bar\')' ); at perlmonks.pl line 25. @_ = ( '(\'Baz\',\'D\\\'Qux (II)\')' ); at perlmonks.pl line 25. @_ = ( '(\'(and so on, \',\'and so forth);\')' ); at perlmonks.pl line 25.

      Hi,

      The dump files are huge to be loaded into a variable entirely. Could you please suggest a way to load and process the content by parts of limited size?

      Your regular expression is interesting. Thank you for this example