in reply to Masking data in a MySQL dump file. Ideas?

Please provide a (mock) sample of your input data.

Even if there are no new line characters in your data, I would guess there must be some form of separator that tells MySQL where one record ends and the next one starts. If so, you should presumably be able to use that form of separator to read your input row by row as if it were a text file with end-of-line characters (may be by redefining locally $/, the input record separator).

  • Comment on Re: Masking data in a MySQL dump file. Ideas?

Replies are listed 'Best First'.
Re^2: Masking data in a MySQL dump file. Ideas?
by dliakh (Acolyte) on Oct 13, 2016 at 11:49 UTC

    Hi Laurent,

    Only the SQL statements are delimited from each other by the new line characters. All the rows of data in the INSERT statement look like this:

    INSERT INTO sample_table (first_name,last_name) VALUES ('John','Doe'),('John','Smith'),('Foo','O\'Bar'),('Baz','D\'Qux (II)'),('(and so on, ','and so forth);');

    No new line character starting from the beginning of the INSERT word and until the very terminating ; character.

    The values inside quotes, of course, can contain embedded special characters like apostrophe, comma, opening and closing parentheses, semicolon.

    Thank you

      Hi, Doesn't look too bad. First you want something along the lines of
      my $balpar_re=qr{ ( \( # opening ( (?:[^'"()]++ # Not a ', ", ( or ) - no + backtracking # incorrect |(?:'(?:[^']|\')*?') #' a single qu +ote string # incorrect |(?:"(?:[^"]|\")*?") #" a double qu +ote string |(?:'(?:\\'|[^'])*?') #' a single quote string |(?:"(?:\\"|[^"])*?") #" a double quote string |(?-1) # must be a ( or ) so re +curse )*+ # zero or more time - no +backtracking \) # closing ) ) }x;
      so you can get a "(" with its matching ")". Then you want a substitution regex that grabs the contents of each of the sets of balanced parenthesis and feeds them to a function that processes them (one set at a time). Note the above regex does deal with quoted strings.
      #!/usr/bin/env perl use Data::Dumper; use strict; use warnings; my $string=q{INSERT INTO sample_table (first_name,last_name) VALUES (' +John','Doe'),('John','Smith'),('Foo','O\'Bar'),('Baz','D\'Qux (II)'), +('(and so on, ','and so forth);');}; my $balpar_re=qr{ ( \( # opening ( (?:[^'"()]++ # Not a ', ", ( or ) - no + backtracking |(?:'(?:\\'|[^'])*?') #' a single quote string |(?:"(?:\\"|[^"])*?") #" a double quote string |(?-1) # must be a ( or ) so rec +urse )*+ # zero or more time - no +backtracking \) # closing ) ) }x; $string=~ s{$balpar_re}{mung($1)}ge; exit; sub mung { warn Data::Dumper->Dump([\@_],[qw(*_)]),' '; }
      gives
      @_ = ( '(first_name,last_name)' ); at perlmonks.pl line 25. @_ = ( '(\'John\',\'Doe\')' ); at perlmonks.pl line 25. @_ = ( '(\'John\',\'Smith\')' ); at perlmonks.pl line 25. @_ = ( '(\'Foo\',\'O\\\'Bar\')' ); at perlmonks.pl line 25. @_ = ( '(\'Baz\',\'D\\\'Qux (II)\')' ); at perlmonks.pl line 25. @_ = ( '(\'(and so on, \',\'and so forth);\')' ); at perlmonks.pl line 25.

        Hi,

        The dump files are huge to be loaded into a variable entirely. Could you please suggest a way to load and process the content by parts of limited size?

        Your regular expression is interesting. Thank you for this example