noodleish has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise ones. I have a script that generates a module which is then used by another script for doing a lot of regexp matches. I went that route since the generation is quite heavy. The module is basically a holder for a mixed hash structure with the information needed.In this module a regexp is stored as

$myhash{...}{REGEXP} = qr/^something$/;

Though when I write to to a file and run it looks like something

$myhash{...}{REGEXP} =  qr/(?-xism:(?-xism:^something$))

which is that it gets compiled twice? I'm wondering how to store these regexp in a better or proper way. TIA.
Noodle

Replies are listed 'Best First'.
Re: Generating and storing regexp
by ELISHEVA (Prior) on Feb 20, 2011 at 14:37 UTC

    What you are seeing is the normal string representation of a regular expression.

    Since you want to load it back as a regular expression, a better way to meet your goal would be to dump the generated hash to a YAML string using YAML::Dump and store that string in a file. Then when you want to use the hash, retrieve the file contents in a string and pass it to YAML::Load - the hash will be restored exactly in the same state that you dumped it. YAML can handle dumping and reloading of regular expressions, hashes, blessed objects, network graphs, circular references and much more. By using YAML (rather than Storable which is binary), the dump file will also be portable across machines and human readable so you can eyeball it for a sanity check - see YAML.

    However, I have to ask - why are you dumping this hash to a file rather than generating it on the fly when you need it and passing it to a subroutine when it comes time to apply to to some data?

      Thank you for your reply. YAML sounds really great I'll try it out immediately. I've written a daemon which uses the hash and since the generation of the hash is very heavy and some other factors (the generation is (probably) done on another server) I've chosen to split it them up, to make it more simpler and easier to check for errors.

Re: Generating and storing regexp
by Eliya (Vicar) on Feb 20, 2011 at 15:12 UTC
    Though when I write to a file and run it looks like something ...

    How are you writing the hash?  Data::Dumper?   How have you determined the regex looks like qr/(?-xism:(?-xism:^something$))?

    I think what you see is just an artifact of regex stringification.  I.e., when the regex is stored to the file, it is stringified with the flags being added:

    use Data::Dumper; my %myhash; $myhash{REGEXP} = qr/^something$/; print Dumper \%myhash; __END__ $VAR1 = { 'REGEXP' => qr/(?-xism:^something$)/ };

    When you then load it again, qr/(?-xism:^something$)/ is being eval'ed.  And if you stringify the resulting regex again, another set of flags is being added...  Or put differently:

    $myhash{REGEXP} = qr/^something$/; print $myhash{REGEXP}; # --> (?-xism:^something$) $myhash{REGEXP} = qr/(?-xism:^something$)/; print $myhash{REGEXP}; # --> (?-xism:(?-xism:^something$)) $myhash{REGEXP} = qr/(?-xism:(?-xism:^something$))/; print $myhash{REGEXP}; # --> (?-xism:(?-xism:(?-xism:^something$))) ...

    The multiple flags are certainly redundant, but I'm not sure this is something to worry about.

    Update: BTW, YAML also stores the regex stringified (i.e. including flags)

    --- REGEXP: !!perl/regexp (?-xism:^something$)

    but when you look at the class YAML::Type::regexp (in YAML/Types.pm) which is responsible for loading regexes, you'll see that YAML::Type::regexp::yaml_load() does quite a bit more work than just eval'ing the regex:

    package YAML::Type::regexp; ... use constant _QR_TYPES => { '' => sub { qr{$_[0]} }, x => sub { qr{$_[0]}x }, i => sub { qr{$_[0]}i }, ... msix => sub { qr{$_[0]}msix }, }; sub yaml_load { my $self = shift; my ($node, $class) = @_; return qr{$node} unless $node =~ /^\(\?([\-xism]*):(.*)\)\z/s; my ($flags, $re) = ($1, $2); $flags =~ s/-.*//; my $sub = _QR_TYPES->{$flags} || sub { qr{$_[0]} }; my $qr = &$sub($re); bless $qr, $class if length $class; return $qr; }

    In other words, it extracts any flags from the stringified regex, and puts them back "outside of" qr{}, i.e.

    REGEXP: !!perl/regexp (?i-xsm:^something$) | v qr{^something$}i

    The respective flag combinations are stored as subs in a lookup table _QR_TYPES.   This way, the above problem of doubling the flags is avoided.

      Thank you for your reply. I did implement YAML and while generating I store two hashes. One without regexps and one with with regexps. When I store it I load it again and compares the two hashes and they are the same, everything is fine. BUT when I load it in my main program, I get segmentation fault on the one with regexps but the other get read without any problem. They are loaded with the same function but not at the same time. What could be wrong?
      Made a test for ruling out errors, still the same thing

      #!/usr/bin/perl use warnings; use strict; use YAML::XS qw(LoadFile Dump); my $ref_LOG = LoadFile("modules/patterns.data"); print Dump($ref_LOG);
      Output
      --- ~ Segmentation fault


      The data file looks sane, and it was verified...

        Segfaults are the result of programming bugs, or sometimes caused by not correctly handling unexpected input (which is essentially also a bug).

        Unfortunately, it's next to impossible to debug such problems without being able to reproduce them.

        In other words, it would help if you could reduce the input to a minimal case the allows to reproduce the issue, and then submit a bug report to YAML::XS, as the author of the module can probably help better here.

        Until then, you might want to try the non-XS version of YAML instead.

Re: Generating and storing regexp
by educated_foo (Vicar) on Feb 20, 2011 at 15:32 UTC
    If you're trying to generate a data structure to be loaded and used later, you might want to use Storable, which is designed for storing and loading data structures: perl -MStorable -le 'store { x => qr/x/ }, "/tmp/x"; $x = retrieve "/tmp/x"; print $x->{x}'

      AFAICT, Storable can't store regexes. At least I do get "Can't store REGEXP items at ..." when I try to run your code.  (Also listed under BUGS in the docs)

      (Storable 2.22 that comes with Perl 5.12.2)

        Iinteresting...
        $ perl -MStorable -le 'print $Storable::VERSION, " ", $^V' 2.25 v5.10.0 $ perl -MStorable -le 'store { x => qr/x/ }, "/tmp/x"; $x = retrieve " +/tmp/x"; print $x->{x}' Regexp=SCALAR(0x10082e9f0) $ perl -MStorable -le 'store { x => qr/x/ }, "/tmp/x"; $x = retrieve " +/tmp/x"; print qq{!${$x->{x}}!}' !!
        So the latest Storable *pretends* to store it, but in fact gives you garbage. Cute.

        I can confirm that. Found out the hard way... :)

Re: Generating and storing regexp
by Khen1950fx (Canon) on Feb 21, 2011 at 07:23 UTC
    You might be able to be more productive by using Regexp::Assemble. I just started using it, and it does come in handy.
    #!/usr/bin/perl use strict; use warnings; use Data::Dump qw(dump); use Regexp::Assemble; my $ra = Regexp::Assemble->new(debug => 8); $ra->add( qr/^something$/ ); print "REGEXP => ", dump($ra->re), "\n";