downer has asked for the wisdom of the Perl Monks concerning the following question:

I have been struggling with this for the past few days. I have been trying to debug with GDB, but with out too much success. I'll get straight to the point. my perl looks something like this:
$page = contents of some web page $url = url of that web page $len = #bytes of that web page $parsed = MyParser($url, $page, $len)
where MyParser is really just a small passes along data to some other code that a colleague wrote. MyParser looks like:
char* MyParser(char* url, char* page, int len) { char *pool; int ret; pool = (char*)malloc(2*len+1); // parsing page ret = parser(url, page, pool, 2*len+1); if(ret > 0) { return pool; } else { return '0'; } }
i know that parser works fine. however, the first time i call this, i get errors like this from GDB
#0 0x000000398d6a22fd in Perl_sv_setpv () from /usr/lib64/perl5/5.8.6 +/x86_64-linux-thread-multi/CORE/libperl.so #1 0x00002aaaae180ded in XS_main_MyParser (my_perl=0x505010, cv=0xb36 +1b0) at getandParseWithC_pl_fa1e.c:400 #2 0x000000398d69b67e in Perl_pp_entersub () from /usr/lib64/perl5/5.8.6/x86_64-linux-thread-multi/CORE/libperl. +so #3 0x000000398d67f3cd in Perl_runops_debug () from /usr/lib64/perl5/5.8.6/x86_64-linux-thread-multi/CORE/libperl. +so #4 0x000000398d639dbe in perl_run () from /usr/lib64/perl5/5.8.6/x86_ +64-linux-thread-multi/CORE/libperl.so #5 0x0000000000401a01 in main ()
i dont know why i get errors regarding Perl_sv_setpv since i dont make any use (to my knowledge!) of the perl stack here. can anyone please offer some advice? I throw myself on the mercy of the perl monks!

Replies are listed 'Best First'.
Re: some help with inline C
by almut (Canon) on Oct 17, 2007 at 16:54 UTC

    Just an idea: you probably want return "0"; instead of return '0';. A single char (as opposed to char*) is not the appropriate input for sv_setpv(), which is used behind the scenes to convert the return value of MyParser() into the SV that the Perl side wants.

    With that change, I can run the example just fine (replacing your parser() routine with a simple printf(), that is — as I of course don't have your parser lib...). Otherwise (with the '0'), I do get a segfault when that code branch is executed.

    If that doesn't fix it, it would help to know what the actual error is, not only where it occurs. Also, bonus points for a minimal self-contained snippet of code that we can actually run in order to reproduce the error :)

      almut, thanks. that did help a lot, i dont get segfaults right away anymore :) I can now run the code for a little bit before it segfaults, now with a different error message. I dont know too much about GDB, but this is what backtrace tells me. please give me any help on how i can provide you with more information.
      (gdb) backtrace #0 0x00002aaaae17f1f9 in tag_parser (tag=0x2 <Address 0x2 out of boun +ds>, len=-1216936334, back_tag=0x7fffffc91ac7 "") at getandParseWithC_pl_05fd.xs:52 #1 0x00002aaaae17f762 in parser (url=0x4cd9780 "openpolytechnic.ac.nz +/ftp/linux/sunsite/docs/faqs/ftp-faq", doc=0x2aaab7770010 "HTTP/1.1 200 OK\r\nDate: Wed, 12 Oct 2005 01:1 +6:33 GMT\r\nServer: Apache/2.0.54 (Unix) mod_ssl/2.0.54 OpenSSL/0.9.7 +g DAV/2\r\nLast-Modified: Sun, 29 May 2005 11:11:07 GMT\r\nETag: \"19 +3b68-1ead-d3e728c0\"\r\nAcce"..., buf=0x153efac0 "openpolytechnic U\nac U\nnz U\nftp U\nlinux U\nsun +site U\ndocs U\nfaqs U\nftp U\nfaq U\nftp P\nhowto P\nftp P\nfile P\n +transfer P\nprotocol P\nis P\na P\nclient P\nserver P\ntcp P\nprotoco +l P\nthat P\nallows P\na P\nuser P\nto"..., blen=16321) at getandPars +eWithC_pl_05fd.xs:176 #2 0x00002aaaae17fb4a in MyParser (url=0x4cd9780 "openpolytechnic.ac. +nz/ftp/linux/sunsite/docs/faqs/ftp-faq", page=0x2aaab7770010 "HTTP/1.1 200 OK\r\nDate: Wed, 12 Oct 2005 01: +16:33 GMT\r\nServer: Apache/2.0.54 (Unix) mod_ssl/2.0.54 OpenSSL/0.9. +7g DAV/2\r\nLast-Modified: Sun, 29 May 2005 11:11:07 GMT\r\nETag: \"1 +93b68-1ead-d3e728c0\"\r\nAcce"..., len=8160) at getandParseWithC_pl_0 +5fd.xs:310 #3 0x00002aaaae180dcb in XS_main_MyParser (my_perl=0x505010, cv=0xb36 +240) at getandParseWithC_pl_05fd.c:400 #4 0x000000398d69b67e in Perl_pp_entersub () from /usr/lib64/perl5/5.8.6/x86_64-linux-thread-multi/CORE/libperl. +so #5 0x000000398d67f3cd in Perl_runops_debug () from /usr/lib64/perl5/5.8.6/x86_64-linux-thread-multi/CORE/libperl. +so #6 0x000000398d639dbe in perl_run () from /usr/lib64/perl5/5.8.6/x86_ +64-linux-thread-multi/CORE/libperl.so #7 0x0000000000401a01 in main ()
      As i said before, parser is a code that my research group has been using for some time with out problems.(though everyone else is a C guru, not perl) I suspect there is an issue with how my code is interacting with parser (or how the perl vars are passed into this C code)I will include the parser code for comleteness :
      #include <stdio.h> #include <stdlib.h> #include <string.h> #define PTAG_B 1 #define PTAG_I 2 #define PTAG_H 3 #define PTAG_TITLE 4 #define PTAG_SCRIPT 5 #define _TITLE_TAG 0x0001 #define _B_TAG 0x0004 #define _H_TAG 0x0008 #define _I_TAG 0x0010 #define xl_isdigit(c) (((c) >= '0') && ((c) <= '9')) #define xl_islower(c) (((c) >= 'a') && ((c) <= 'z')) #define xl_isupper(c) (((c) >= 'A') && ((c) <= 'Z')) #define xl_isindexable(c) (xl_isdigit(c) || xl_islower(c) || xl_isuppe +r(c)) #define xl_tolower(c) ((c) += 'a' - 'A') char* parser_init(char* doc) { char *p; if (strncasecmp(doc, "HTTP/", 5)) return NULL; for (p = doc; (*p != ' ')&&(*p); p++); if (*p == '\0') return NULL; if (atoi(p) != 200) return NULL; p = strstr(p, "\\r\\n\\r\\n"); if (p == NULL) return NULL; return p+4; } int tag_parser(char* tag, int len, char* back_tag) { int i = 0; if (tag[0] == '/') { *back_tag = 1; i++; } else *back_tag = 0; switch (tag[i]) { case 'b': case 'B': case 'i': case 'I': if (!isspace(tag[i+1])) return 0; if ((tag[i] == 'b') || (tag[i] == 'B')) return PTAG_B; return PTAG_I; case 'e': case 'E': i++; if (((tag[i]=='m')||(tag[i]=='M')) && (isspace(tag[i+1]))) return PTAG_I; return 0; case 'h': case 'H': i++; if (((tag[i]>='1')&&(tag[i]<='6')) && (isspace(tag[i+1]))) return PTAG_H; return 0; case 't': case 'T': i++; if ((0==strncasecmp(tag+i, "itle", 4)) && (isspace(tag[i+4]))) return PTAG_TITLE; return 0; case 's': case 'S': i++; if ((0==strncasecmp(tag+i, "trong", 5)) && (isspace(tag[i+5])) +) return PTAG_B; if ((0==strncasecmp(tag+i, "cript", 5)) && (isspace(tag[i+5])) +) return PTAG_SCRIPT; return 0; default: break; } return 0; } #define xlbit_set(__b1, __b2) ((__b1) |= (__b2)) #define xlbit_unset(__b1, __b2) ((__b1) &= ~(__b2)) #define xlbit_check(__b1, __b2) ((__b1)&(__b2)) char* parser(char* url, char* doc, char* buf, int blen) { char *p, *purl, *word, *ptag, *pbuf; char ch, back_tag, intag, inscript; unsigned tag_flag; int ret; p = parser_init(doc); if (p == NULL) return 0; pbuf = buf; /* parsing URL */ purl = url; while (*purl != '\0') { if (!xl_isindexable(*purl)) { purl++; continue; } word = purl; while (xl_isindexable(*purl)) { if (xl_isupper(*purl)) xl_tolower(*purl); purl++; } ch = *purl; *purl = '\0'; if (pbuf-buf+purl-word+3 > blen-1) return -1; sprintf(pbuf, "%s U\\n", word); pbuf += (purl-word)+3; *purl = ch; } /* parsing page */ tag_flag = 0; intag = 0; inscript = 0; while (*p != '\0') { if (!xl_isindexable(*p)) { if (*p != '>') { if (*p == '<') { ptag = p; intag = 1; } p++; continue; } *p = ' '; ret = tag_parser(ptag+1, p-ptag, &back_tag); switch (ret) { case PTAG_B: if (back_tag == 0) xlbit_set(tag_flag, _B_TAG); else xlbit_unset(tag_flag, _B_TAG); break; case PTAG_I: if (back_tag == 0) xlbit_set(tag_flag, _I_TAG); else xlbit_unset(tag_flag, _I_TAG); break; case PTAG_H: if (back_tag == 0) xlbit_set(tag_flag, _H_TAG); else xlbit_unset(tag_flag, _H_TAG); break; case PTAG_TITLE: if (back_tag == 0) xlbit_set(tag_flag, _TITLE_TAG); else xlbit_unset(tag_flag, _TITLE_TAG); break; case PTAG_SCRIPT: if (back_tag == 0) inscript = 1; else inscript = 0; default: break; } intag = 0; p++; continue; } if (inscript || intag) { p++; continue; } word = p; while (xl_isindexable(*p)) { if (xl_isupper(*p)) xl_tolower(*p); p++; } ch = *p; *p = '\0'; if (pbuf-buf+p-word+1 > blen-1) return -1; sprintf(pbuf, "%s ", word); pbuf += (p-word)+1; if (xlbit_check(tag_flag, _B_TAG)) { if (pbuf-buf+1> blen-1) return -1; *pbuf = 'B'; pbuf++; } if (xlbit_check(tag_flag, _H_TAG)) { if (pbuf-buf+1> blen-1) return -1; *pbuf = 'H'; pbuf++; } if (xlbit_check(tag_flag, _I_TAG)) { if (pbuf-buf+1> blen-1) return -1; *pbuf = 'I'; pbuf++; } if (xlbit_check(tag_flag, _TITLE_TAG)) { if (pbuf-buf+1> blen-1) return -1; *pbuf = 'T'; pbuf++; } if (tag_flag == 0) { if (pbuf-buf+1> blen-1) return -1; *pbuf = 'P'; pbuf++; } if (pbuf-buf+1> blen-1) return -1; *pbuf = '\\n'; pbuf++; *p = ch; } *pbuf = '\0'; return pbuf-buf; }
        #0 0x00002aaaae17f1f9 in tag_parser (tag=0x2 <Address 0x2 out of boun +ds>, len=-1216936334, back_tag=0x7fffffc91ac7 "") at getandParseWithC +_pl_05fd.xs:52

        shows that tag_parser is called with 2 for the first argument. That's obviously a bad pointer. The second argument is also obviously wrong.

        One possibility is that the input is something like foo>bar, which would cause tag_parser to be called before ptag is ever initialized. Did you try dumping what parser_init returns and finding out to which char p points?

        You're asking us to debug a C problem, you didn't provide any inputs, and you didn't provide any runnable code. The underlying cause could be a Perl problem, but you haven't gotten that far yet. Do you homework, then come back to us if it's Perl problem.

        Update: Added some details.

        Hi downer,

        When I run gcc -c parser.c I get a number of warnings "return makes pointer from integer without a cast". I think these occur because (the integer) -1 is being returned, even though the function is supposed to return a char*. I don't know if that causes any problems, but it's a bit agricultural, to say the least.

        I also got:
        parser.c:287:11: warning: multi-character character constant parser.c:287: warning: overflow in implicit constant conversion
        Those warnings appear to me to be a little more sinister. They can be removed by changing line 287 from:
        *pbuf = '\\n';
        to:
        *pbuf = '\n';
        Does that change help your cause at all ?

        Given that we now have libparser.a (or, at least, have the capacity to build it), if you like to give us a (complete) sample Inline::C script that you're using, we might be able to make some real progress :-)

        Cheers,
        Rob
Re: some help with inline C
by ikegami (Patriarch) on Oct 17, 2007 at 15:39 UTC
    I don't know why you are getting an error — I have very limited guts experience — but Perl_sv_setpv is used to create or initialize an SV from the char* your function returns.
      if that's the case, what is the solution?
Re: some help with inline C
by thenetfreaker (Friar) on Oct 17, 2007 at 17:06 UTC
    Maybe just try translating MyParser to perl code...

    sub MyParser { my ($url,$page,$len) = @_; { my $pool = (char*)malloc(2*len+1); # translate it because i don't quiet understa +nd it // parsing page my $ret = parser(url, page, pool, 2*len+1); # translate it because i don't quie +t understand what "parser" is if($ret > 0) { return $pool; } else { return '0'; } }

    and don't forget toy use strict and warnings :)