davee123 has asked for the wisdom of the Perl Monks concerning the following question:

We've got a Perl script, it's reliably giving us a Segfault. I put debug in, and presto! Problem solved. Put it in the debugger? Works fine.

We've narrowed down the problem as follows. Our code:

... eval { $DT = DateTime->new( year => $year, time_zone => $tz ); }; return if $@; ...

Specifically, the above code executes normally several dozen times, and then segfaults when passed a $year of "2011", and a $tz of "Europe/London". BUT! Here's the catch, and here's where I'm getting out of my league:

JUST before that, it calls the same code with $year = "2010" and $tz = "Europe/London". THAT call fails and is successfully caught by the wrapped eval, which actually returns us an error message like this:

The 'name' parameter (undef) to DateTime::TimeZone::new was an 'undef' +, which is not one of the allowed types: scalar at /idcom/weblive/extperl/lib/perl5/site_perl/5.8.5/DateTime/TimeZone +.pm line 34 DateTime::TimeZone::new('undef', 'name', 'undef') called at /idcom/weblive/extperl/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread +-multi/DateTime.pm line 192 DateTime::new('undef', 'year', 2010, 'time_zone', 'Europe/Lond +on') called at TZInfo.pm line 33 TZInfo::get_dst_changes(2010, 'Europe/London') called at ./nz. +pl line 1039

The NEXT call (for 2011, same time zone), causes the segfault, and isn't caught by the wrapped eval.

So. One thing I noticed is that it reports that I called both DateTime::new and DateTime::TimeZone::new with a leading "undef" parameter! Shouldn't that be the package name?

Beyond that, I'm at a loss. I can't replicate it in a smaller, standalone test, and I have no idea why it should actually segfault. And looking at DateTime::new, it looks like it's possible that the Params::Validate module is what's processing the arguments and turning the name field into an undef-- but it seems wholly unlikely to me that there would be a bug in either DateTime, DateTime::TimeZone, or Params::Validate.

Any ideas?

DaveE

UPDATE

Well, not much to say apart from the fact that I've quasi-confirmed that Params::Validate isn't doing what it's supposed to do (probably thanks to some error state that we've created, but I have no idea how). Within DateTime.pm:
... sub new { my $class = shift; my %p = validate( @_, $NewValidate ); ...

I added in debug (that still contained the failure) to print out @_, %p, and $NewValidate both before and after the call to Params::Validate. Result?

@_ set to: $VAR1 = [ 'year', 2010, 'time_zone', 'Europe/London' ]; %p set to: $VAR1 = {};

So, it properly shifted off the $class (it was being set correctly it seems), but when attempting to pass off to Params::Validate::validate, the resultant hash comes back empty! Trying to hack into Params::Validate now, but I have a feeling I'm barking up the wrong tree-- maybe something's screwed up the stack and comes unstuck at the return from Params::Validate or something, but again, I have doubts that the library itself has a problem.

Replies are listed 'Best First'.
Re: Tracking down a segfault
by zgpmax (Initiate) on Nov 19, 2020 at 17:55 UTC

    Many Params-Validate versions before 1.20 have a bug in the XS code that could cause the corruption described. (Versions before 0.50 did not have an XS implementation and so would not suffer from this problem.)

    If you are using an older version, you can check if you are suffering from this bug by persuading your program to use the pure perl implementation. (Read the source of Params/Validate.pm on your system to find out how to do this.) If the problem goes away under the pure perl implementation, then you probably are suffering from the bug in the XS implementation.

    The bug was addressed in versions 1.20 and 1.23.

    1.23    2016-03-26
     
    - Fixed some Perl stack corruption bugs. Based on a proposed PR from Tony Cook
      plus some additional changes. GH #8.
     
    - Fixed tests with Carp 1.01 (shipped with Perl 5.8.3). Patch by Andreas
      Koenig. RT #113318.
    1.20    2015-06-28
     
    - Fixed a bug with stack handling in the XS code. If a callback sub caused
      Perl to reallocate the stack this could trigger weird errors of the form
      "Bizarre copy of ARRAY" from Perl itself. Fixed by Noel Maddy. GH #5.

    See also https://rt.cpan.org/Public/Bug/Display.html?id=86811

Re: Tracking down a segfault
by runrig (Abbot) on Oct 24, 2011 at 19:17 UTC
    I don't know what the problem is (hard to tell with XS modules, good luck, and who knows, sometimes making a seemingly innocuous change like the one below can seem to mysteriously 'fix' the problem), but I would write it as:
    $DT = eval { DateTime->new( year => $year, time_zone => $tz ) }; return unless $DT;
    Don't depend on $@ to signal an exception.
Re: Tracking down a segfault
by bluescreen (Friar) on Oct 24, 2011 at 21:39 UTC

    For problems like these I'd go for GDB the thing is that it can be overwhelming since you'll be also dealing with Perl's internals, so i'd highly recommend you have an strong reproduction case before you move on to save tons of time

Re: Tracking down a segfault
by Anonymous Monk on Oct 25, 2011 at 12:44 UTC

    The XS version is most likely calling into the kernel/system, particularly, the local Time Zone database, hence the reason you're getting a segfault from passing bad parameters.

    Given the recent legal debacle with the default time zone database used on most systems, if you have any kind of "auto update" running on your system, the culprit may not be your perl code. The Linux folks may have reacted to the copyright infringement lawsuit by modifying their TZ database in some strange, and incompatible way.

    I noticed your perl version is ancient, v5.8.8, so if you're using a newer module with an old perl, this might be the cause.

    Personally, I see no reason for you to be running the code inside an eval? Do you have some specific reason for doing this? --Note: I can't see much of the code, so there may be some hidden reason.

    If you want to remove your 'self' from harms way, validate the input. It would also be wise to validate the inputs you send to Params::Validate, specifically, make sure @_ is non-empty, and make sure $NewValidate is a HASHREF and is non-empty.

    sub new { if ($_[0] =~ m/^$PROGRAM=HASH\(0x/) { my $class = shift(); print "Got Self\n"; } else { print "Selfless\n"; } if ( ((defined @_) && (@_ > -1)) && ((defined $NewValidate) && (ref($NewValidate) eq 'HASH') && (keys %$NewValidate)) ) { my %p = validate( @_, $NewValidate ); unless ((defined %p) && (keys %p)) { die("failed to validate!"); } } else { die("a horrible death!"); } ...

    Hope this helps

      Well, the oddity to me is that we're not passing bad parameters. Or, at least, that's what everything seems to say. The Perl error expressly tells us that we're providing the necessary params correctly, and printing them out immediately before the call shows the same thing.

      The error is caused by the fact that DateTime.pm somehow loses the parameters that we passed it. We pass them, and I hacked a print line into DateTime.pm to prove that it was indeed receiving the params correctly in @_. However, DateTime.pm then processes the arguments using Params::Validate. So, when Params::Validate returns the object, it's empty, with no error!

      As for why it's in an eval? I think that's because it's important that the code doesn't die (I didn't actually write it, so I'm guessing here). The library is used in a few places, one of them processing human input, which occasionally is input with crap. So I believe without the eval, junk input can result in an exit from DateTime, which shouldn't cause the program to fail.

      We're suspicious about the fact that it's time zone code thanks to the fact that there was the recent problem with the Olson DB, which might have causing this, but it... doesn't seem like the problem. At least not offhand. We re-processed the Olson data a few times (using different versions), but nothing seems to solve the error.

      Our guess for the time being is that the error is further upstream somewhere, and perhaps something in the stack is being screwed up, and it's only getting to the breaking point when it gets down to the TimeZone level.

      The whole environment is ... interesting. As noted, it's a very old Perl version (we've been denied in trying to have them upgrade it, and some libraries we maintain ourselves, not maintained by the sysadmins). But I'm still at a loss, since, well, you're not supposed to get actual segfaults!

      DaveE

        It might be something trivial, but I noticed one thing in what you've said:

        I added in debug (that still contained the failure) to print out @_, %p, and $NewValidate both before and after the call to Params::Validate. Result?

        The output you posted doesn't show anything for $NewValidate?

        You may have just skipped posting it for the sake of brevity, but I did notice the omission of a dump for $NewValidate.

        What versions of Params::Validate and DateTime:: are you using?

        http://search.cpan.org/dist/Params-Validate/
        From the newest v1.00 docs/source of Params::Validate::validate(), the first argument is @_ and the second is a HASH or HASHREF. The oldest perl installation I have over here is v5.10.1 along with Params::Validate v0.91 --It's as close as I can get to what you're using over there. At least with the oldest stuff I have here, when the second argument to validate() is an empty HASHREF (validation spec), I get a fatal error (carp).

        #!/usr/bin/perl -w use strict; use diagnostics; use warnings FATAL => 'all'; use Data::Dumper; $Data::Dumper::Useperl = 1; $Data::Dumper::Indent = 1; $Data::Dumper::Sortkeys = 1; $Data::Dumper::Useqq = 1; $Data::Dumper::Deparse = 1; use Params::Validate qw(:all); sub foo { validate( @_, { 'bar' => 1, # mandatory 'baz' => 0, # optional } ); print "Hello Nurse!\n"; } foo('bar' => "arg1"); sub qux { validate( @_, {} ); print "Empty Hash\n"; } qux('test');
        Output:
        $ perl crap2.pl Hello Nurse! Uncaught exception from user code: Odd number of parameters in call to main::qux when named param +eters were expected at crap2.pl line 25 main::qux('test') called at crap2.pl line 31 at crap2.pl line 25 main::qux('test') called at crap2.pl line 31

        I haven't actually used the Params::Validate (or its sister Config::Validate) recently, but I (quickly) read all of the code for the newest version a two days ago. Purely for fun, I've been looking into validation in a general sense, so they're on my "must know" list. I haven't read DateTime in a *very* long time, but I'll look at it for ya.

        As for the use of eval to prevent termination or warnings, one really *must* have a Darn Good Reason (DGR) to use it, and the smart thing to do is to use a local() $SIG{__DIE__} and/or $SIG{__WARN__} to handle the expected issues. Since you've posted the previous failure (e.g. error on 2010, not the 2011 segfault), it seems you're doing some kind of signal trapping but where and how is a mystery. You can augment your eval with local signal traps to give you more info:

        ... eval { local $SIG{__WARN__} = sub { print STDERR "HOOKED __WARN__\n"; print STDERR Carp::longmess(); return(); }; local $SIG{__DIE__} = sub { print STDERR "HOOKED __DIE__\n"; print STDERR Carp::longmess(); exit(1); # or in your case return() }; # Examples - message not printed due to hooks. # CORE::warn("Warn Message:\n", @_, "\n"); # CORE::die("Died Message:\n", @_, "\n"); $DT = DateTime->new( year => $year, time_zone => $tz ); }; return if $@; ...

        It seems my guess was correct about DateTime::TimeZone using the local system, particularly the local timezone database. I suspect this is the root cause to of the segfault.

        http://search.cpan.org/~drolsky/DateTime-0.70/lib/DateTime.pm#Time_Zone_Warnings
        Determining the local time zone for a system can be slow. If $ENV{TZ} is not set, it may involve reading a number of files in /etc or elsewhere. If you know that the local time zone won't change while your code is running, and you need to make many objects for the local time zone, it is strongly recommended that you retrieve the local time zone once and cache it

        ...

        http://search.cpan.org/~drolsky/DateTime-0.70/lib/DateTime.pm#Constructors
        The time_zone parameter can be either a scalar or a DateTime::TimeZone object. A string will simply be passed to the DateTime::TimeZone->new method as its "name" parameter. This string may be an Olson DB time zone name ("America/Chicago"), an offset string ("+0630"), or the words "floating" or "local".

        From the above you have some choices to test while keeping the exact same functionality, while potentially avoiding a system-based timezone problem.

        1. check /etc/localtime (BSD) -I'm not sure of linux equivalent?
        2. Set LC_TIME for your locale (checking your locale might help)
        3. Set TZ in your environment (prevent system lookup)
        4. Set $ENV{TZ} in your code (prevent system lookup)
        5. Cache the time zone (details in first link above)
        6. Try using a DateTime::TimeZone object (rather than a string) in your call DateTime->new() --I'd put this in your eval as below.

        ... eval { $tz = DateTime::TimeZone->new( name => 'Europe/London' ); $DT = DateTime->new( year => $year, time_zone => $tz ); }; return if $@; ...

        I'm not familiar with timezone handling on linux, but on BSD, I'd check to make sure there's a link between /etc/localtime to the appropriate time zone file in th TZ database. A modification to the local TZ database without fixing /etc/localtime can cause a real mess.

        The root cause of the segfault is most likely in XS code not playing nicely with your system code. On earlier versions of Params::Valiadate (like the v0.91 I've got), there's code to exclude XS usage on earlier perls, but newer versions (like current) have changed this. More likely, something in XS code of DateTime or DateTime::TimeZone is making bad calls into your system time code, and/or timezone code/database. The first problem you have on 2010 where you get an error could be corrupting things, and the second on 2011 could be the last straw. If none of the above listed stuff fixes your problem, I'd try upgrading just the modules; DateTime, DateTime::TimeZone, Params::Validate.

        As for why this bug has surfaced on "previously running" code, it could be related to recent changes made in your linux distro time zone database (or code) due to the recent lawsuit.
        http://en.wikipedia.org/wiki/Tz_database#2011_lawsuit
        I'm betting you're in a "managed" environment with automated system updates at work changing your TIME/TZ stuff out from under you.