Re: Small Corrupted Storable Retrieve Dies With an Invalid "Out of memory!"

Well I don't have the full answer but I can fill in the picture. As it turns out it is not a bug in the READING of the file per se (although this is when it crashes), rather it is almost certainly a bug with the WRITING of the storable file. Basically the problem actually occurs in the RLEN macro, immediately above the KBUFCHK. This RLEN macro reads an integer from the Storable file that specifies the size of the next data chunk to be read. The problem occurs because instead of some reasonable figure like N bytes being read corruption of the file leads to pseudorandom 4 byte int being read. This value could be anything from 0-4 billion. If the numerical value of this exceeds the default 128 byte buffer KBUFCHK tries to malloc the required space. Although you have __plenty__ of memory you don't have 4GB, which is what it is trying to malloc. Thus the problem is not really with the reading side per se, the data was actually written corruptly. At least that is how it seems. The main debugging problem is that the precise original data has been lost.

I applied the following debugging patch:

94,95d93
< #define DEBUGME
<
447d444
<               TRACEME(("** extending kbuf to %u bytes (had %u)", (un
+signed int)x+1, ksiz)); \
896d892
<       TRACEME(("1 RLEN x=%u", (unsigned int) x));             \
898,899c894
<       TRACEME(("2 RLEN x=%u", (unsigned int) x));             \
<       if (!cxt->fio)  {                                       \
---
>       if (!cxt->fio)                                          \
901,902d895
<               TRACEME(("3 RLEN x=%u", (unsigned int) x));           
+  \
<       } \
905d897
<       TRACEME(("4 RLEN x=%u", (unsigned int) x));             \
908d899
<       TRACEME(("5 RLEN x=%u", (unsigned int) x));             \
4815d4805
<               if ( (unsigned)size > 1000000 ) Perl_croak(aTHX_ "Bugg
+er!\n");
[download]

Note you can catch that Perl_croak BTW but it will leak leak memory as I am not freeing. The number is not negative either (it is a large unsigned int). More on croaking in a second.

With the patch applied the end of the error trace now looks like:

4 RLEN x=6
5 RLEN x=6
(#0) key 'latest'
(#1) value
retrieve
retrieve type = 8
retrieve_byte (#244)
small integer read as 131
aseen(#244) = 0x80fb63c (refcnt=1)
byte 3
ok (retrieve_byte at 0x80fb63c)
ok (retrieved 0x80fb63c, refcnt=1, SCALAR)
1 RLEN x=6
2 RLEN x=6
4 RLEN x=5
5 RLEN x=5
(#1) key 'count'
(#2) value
retrieve
retrieve type = 5
retrieve_undef
aseen(#245) = 0x80fb648 (refcnt=1)
ok (retrieved 0x80fb648, refcnt=1, SCALAR)
1 RLEN x=5
2 RLEN x=5
4 RLEN x=5
5 RLEN x=5
(#2) key 'first'
ok (retrieve_hash at 0x80fb624)
ok (retrieved 0x80fb624, refcnt=1, HASH)
ok (retrieve_ref at 0x80fb618)
ok (retrieved 0x80fb618, refcnt=1, REF)
1 RLEN x=2
2 RLEN x=2
4 RLEN x=7
5 RLEN x=7
(#8) key 'session'
(#9) value
retrieve
retrieve type = 6
retrieve_integer (#246)
aseen(#246) = 0x80fb654 (refcnt=1)
integer 1769174005
ok (retrieve_integer at 0x80fb654)
ok (retrieved 0x80fb654, refcnt=1, SCALAR)
1 RLEN x=7
2 RLEN x=7
4 RLEN x=4110839407
5 RLEN x=4110839407
Bugger!, at ./test.pl line 4
[download]

OK so all that basically shows is that RLEN is supposed to return the length of the next bit of data and that at the point it chokes it returns a rather large integer. It dies out of memory because it tries to malloc 4GB. Anyway the integer 4110839407 is 0 x F5 06 6E 6F

Looking at the datafile with a hex editor we see:

00000FE0:    05 00 00 00 63 6F 75 6E 74 05 05 00 00 00 66 69  ....coun
+t.....fi
00000FF0:    72 73 74 07 00 00 00 73 65 73 73 69 6F 6E 06 F5  rst....s
+ession..
                                  s  e  s  s  i  o^ n^  ^  ^
00001000:    73 73 69 6F 6E 06 F5 00 00 00 0D 00 00 00 73 65  ssion...
+......se
             s  s  i  o  n   ^  ^ 
00001010:    73 73 69 6F 6E 5F 63 6F 75 6E 74 0A 01 4C 04 00  ssion_co
+unt..L..

NOTE >>>>> Offset of the error is 4096 bytes.
[download]

The data 6F 6E 06 F5 we see is no great suprise as the RLEN macro is essentially just doing an fread (#define PerlIO_read(x,y,z) fread(y,1,z,x) ) due to the authors insane love of macros. Anyway these 4 bytes of data are where our 4 billion comes from.

If you look at the previous keys you see we should be reading a small integer ie 0X 00 00 00 (we are little endian on x86 so you read it R->L ). You will also note that problem occurs at an offset of 4096 bytes. You can also note the ..ssion which looks supiciously like an offset error. Even more so when you note that following ..ssion we have 06 F5 00 00 which is a far more typical pattern. Also we have the same two bytes 06 F5 after session and ..ession. In fact if you get delete the 7 ..ssion bytes you don't error out there anymore. You can also delete the 7 bytes of session. It crashes again another K or so into the file but the data it is extracting all looks valid.

So that is my hint. The problem is that it will probably be a very specific data set that casues what I expect is a buffer issue.

If you apply the suggested single line Perl_croak patch (between RLEN and KBUFCHK calls around line 4810 in Storable.xs)

    RLEN(size);               /* Get key size */
+   if ( (unsigned)size > 1000000 ) Perl_croak(aTHX_ "Bugger!\n");
    KBUFCHK((STRLEN)size);    /* Grow hash key read pool if needed */
[download]

then in your application you could (after you have stored the data) try a retrieve. You can catch that Perl_croak with an eval so it won't kill your mod_perl child. If it does croak you can capture the raw data structure that causes the problem. This immediately identifies a problem profile that needs fixing but more importantly it will give you a test case data structure. It may be possible to debug it without this test case but it would be a lot easier if you have at least one problem data structure to validate the patch against.

store \%profile, $file;
eval{ retrieve($file) };
if ( $@ ) {
    require Data::Dumper;
    warn "\n\n\nStorable Choked\n\n\n";
    warn Data::Dumper::Dumper(\%profile);
    print "Content-Type: text/html\n\nBugger!\n";   # ;-)
  exit 1;
}
[download]

cheers

tachyon

Comment on Re: Small Corrupted Storable Retrieve Dies With an Invalid "Out of memory!" Select or Download Code

Replies are listed 'Best First'.
Re^2: Small Corrupted Storable Retrieve Dies With an Invalid "Out of memory!" by jbisbee (Pilgrim) on Oct 27, 2004 at 21:35 UTC
In your "patch" you used the following code `if ( (unsigned)size > 1000000 ) Perl_croak(aTHX_ "Bugger!\n")` [download] I noticed that the author had a CROAK macro and wondered why you opted for the Perl_croak and what the difference was. `/ Storable.xs 1.0.13 / #define CROAK(x) do { cxt->s_dirty = 1; croak x; } while (0)` [download] `/ Storable.xs 2.13 / #define CROAK(x) STMT_START { cxt->s_dirty = 1; croak x; } STMT +_END` [download] I understand that the cxt->s_dirty is used to let Storable reuse the memory next time around, just wondering what the difference between croak and Perl_croak is. -biz-	[reply] [d/l] [select]
Re^3:Small Corrupted Storable Retrieve Dies With an Invalid "Out of memory!" by tachyon (Chancellor) on Oct 27, 2004 at 22:50 UTC
I was too lazy to look at the syntax of yet another macro. Storable is just one big macro! It is probably better to do: `if ( (unsigned)size > 1000000 ) CROAK(("Bugger!\n"));` [download] Note that you need double parenths as the call to croak does not have any! croak is a Perl alias for Perl_croak_nocontext so the call is near identical but you are setting the context to dirty, looking at the source this is caught and cleaned so should fix the leak. cheers tachyon	[reply] [d/l]