in reply to Generating and storing regexp

Though when I write to a file and run it looks like something ...

How are you writing the hash?  Data::Dumper?   How have you determined the regex looks like qr/(?-xism:(?-xism:^something$))?

I think what you see is just an artifact of regex stringification.  I.e., when the regex is stored to the file, it is stringified with the flags being added:

use Data::Dumper; my %myhash; $myhash{REGEXP} = qr/^something$/; print Dumper \%myhash; __END__ $VAR1 = { 'REGEXP' => qr/(?-xism:^something$)/ };

When you then load it again, qr/(?-xism:^something$)/ is being eval'ed.  And if you stringify the resulting regex again, another set of flags is being added...  Or put differently:

$myhash{REGEXP} = qr/^something$/; print $myhash{REGEXP}; # --> (?-xism:^something$) $myhash{REGEXP} = qr/(?-xism:^something$)/; print $myhash{REGEXP}; # --> (?-xism:(?-xism:^something$)) $myhash{REGEXP} = qr/(?-xism:(?-xism:^something$))/; print $myhash{REGEXP}; # --> (?-xism:(?-xism:(?-xism:^something$))) ...

The multiple flags are certainly redundant, but I'm not sure this is something to worry about.

Update: BTW, YAML also stores the regex stringified (i.e. including flags)

--- REGEXP: !!perl/regexp (?-xism:^something$)

but when you look at the class YAML::Type::regexp (in YAML/Types.pm) which is responsible for loading regexes, you'll see that YAML::Type::regexp::yaml_load() does quite a bit more work than just eval'ing the regex:

package YAML::Type::regexp; ... use constant _QR_TYPES => { '' => sub { qr{$_[0]} }, x => sub { qr{$_[0]}x }, i => sub { qr{$_[0]}i }, ... msix => sub { qr{$_[0]}msix }, }; sub yaml_load { my $self = shift; my ($node, $class) = @_; return qr{$node} unless $node =~ /^\(\?([\-xism]*):(.*)\)\z/s; my ($flags, $re) = ($1, $2); $flags =~ s/-.*//; my $sub = _QR_TYPES->{$flags} || sub { qr{$_[0]} }; my $qr = &$sub($re); bless $qr, $class if length $class; return $qr; }

In other words, it extracts any flags from the stringified regex, and puts them back "outside of" qr{}, i.e.

REGEXP: !!perl/regexp (?i-xsm:^something$) | v qr{^something$}i

The respective flag combinations are stored as subs in a lookup table _QR_TYPES.   This way, the above problem of doubling the flags is avoided.

Replies are listed 'Best First'.
Re^2: Generating and storing regexp
by noodleish (Novice) on Feb 20, 2011 at 17:19 UTC

    Thank you for your reply. I did implement YAML and while generating I store two hashes. One without regexps and one with with regexps. When I store it I load it again and compares the two hashes and they are the same, everything is fine. BUT when I load it in my main program, I get segmentation fault on the one with regexps but the other get read without any problem. They are loaded with the same function but not at the same time. What could be wrong?
    Made a test for ruling out errors, still the same thing

    #!/usr/bin/perl use warnings; use strict; use YAML::XS qw(LoadFile Dump); my $ref_LOG = LoadFile("modules/patterns.data"); print Dump($ref_LOG);
    Output
    --- ~ Segmentation fault


    The data file looks sane, and it was verified...

      Segfaults are the result of programming bugs, or sometimes caused by not correctly handling unexpected input (which is essentially also a bug).

      Unfortunately, it's next to impossible to debug such problems without being able to reproduce them.

      In other words, it would help if you could reduce the input to a minimal case the allows to reproduce the issue, and then submit a bug report to YAML::XS, as the author of the module can probably help better here.

      Until then, you might want to try the non-XS version of YAML instead.

        Reverted to YAML::Syck and it worked. Tried to boil the problem with YAML::XS down. Worked through all the indata and when attaching a particular part of the data to a working YAML file it stopped working. Thing is if I try to just use that part or any other it works, so the data can't be wrong. I can't see anything special with that part (its about 2MB, each block is working seperatly , could be it since the other part I use were smaller). With the same data but generated with Syck it works as a charm. Syck seems to ignore CompressSeries flag so I bet that could be the problem. Or at least my data were generated without it. Thanks again for pointing me to this. Ill never use Dumper again ;)