jepri has asked for the wisdom of the Perl Monks concerning the following question:

I have been writing a network app (in fact, a Gnutella client) that reads a character stream, and processes it. Fine and dandy. I've got my routine breaking the stream into packets and acting on them. But every so often the program would lose sync with the stream.

Finally after many hours debugging I discover that it's my fault. For some reason, my while loop was exiting before it reaches the end of the string. It turns out that there is a magic character in the stream that causes Perl to end the string too soon. But here's the confusing bit - *only some commands are vulnerable*. I can print out the entire string. But I can't substr it. Then when I try to print it again, it is foreshortened.

The evil character appears near (before, I think) the character '0' - that's the number 0. Naturally I guess unknown character is an end-of-string character for perl, but it's not a chr(0) - those come through fine.

while (!(recv $connection,$z,1000,0)){}; #The following line prints the correct number of packets #fetched print "\nLength fetched: ",length($z)," string ",$z if length($z); my $g=0; #$z=quotemeta $z; #This while exits *before* it has finished the string while (my $a=substr $z,$g,1) { $g++; #Process characters in this here

Does anyone know what's going on? I'd appreciate any help or pointers to info that you know of. I read part of the perguts manpage but I couldn't figure anything useful from it.

____________________
Jeremy
I didn't believe in evil until I dated it.

Replies are listed 'Best First'.
Re: Bad Char in string
by jeroenes (Priest) on Apr 19, 2001 at 14:19 UTC
    I tried to confirm that behavior with:
    $str="asdkjhasdkjh".chr(0)."666"; print "\n$str\n"; print length($str)."\n"; print "yes" if $a = substr($str,12,1); print "\n$a\n";
    But chr(0) is just results in a true. But if you mean ord(0) or chr(48), that evaluates as an integer, and thus false, and thusly ends your loop.

    Prevent that with checking for the length of the string, instead of the true-ness ;) of it.

    while( length my $a =substr $z,$g,1) {
    should do it

    Hope this helps,

    Jeroen
    "We are not alone"(FZ)
    Update: Ponder how this runs:

    $str="asdkjhasdkjh".chr(0)."or 0 and I am 666"; print "\n$str\n"; print length($str)."\n"; my $g=0; while( length( my $a= substr $str, $g, 1)){ print "Character nr $g reads $a\n"; $g++; } print "\n\n\tSecond run, without length\n"; $g=0; while( my $a= substr $str, $g, 1){ print "Character nr $g reads $a\n"; $g++; }
    and I think that makes sense ;-} <code>
      I gave it a go, but perl wouldn't run until I changed your line to:

      while( length ( my $a =substr $z,$g,1)) {

      I'm afraid it didn't work^H^H^H^H fix the problem.

      ____________________
      Jeremy
      I didn't believe in evil until I dated it.

Re: Bad Char in string
by mr.nick (Chaplain) on Apr 19, 2001 at 16:53 UTC
    I believe your problems stem from the while (my $a=substr $z,$g,1) { line. Since the substr can possibly resolve to "0", that is getting evaluated as false; and the while exits.

    You probably want something like

    my $a; while (defined($a=substr($z,$g,1))) {
    Find a good discussion on the difference between "if val" and "if defined val".
    But you say you are writing a Gnutella client? Well, the packetsize is well defined for each type. 23 bytes for the header, plus (whatever) for the payload. You should be able to use something like
    my $buff; sysread($socket,$buff,23); my ($did,$payloadid,$ttl,$hops,$length)=unpack("a16CCCL",$buff); my $payload; sysread($socket,$payload,$length);

    (no error checking in example).

Errata
by jepri (Parson) on Apr 19, 2001 at 13:34 UTC

    Update:

    The hunt is off. It was my bad. I made a mistake a little further on in the program where I was indeed evaluating the character in a numeric context.

    Thanks heaps to physi and jeroenes.

    ____________________
    Jeremy
    I didn't believe in evil until I dated it.

Re: Bad Char in string
by physi (Friar) on Apr 19, 2001 at 13:55 UTC
    I can't find an error in your script, but have you tried this one:
    for my $a (split //, $z){ #Process characters in this here }
    This should also go to the string char by char.

    Maybe this will help ?

    ----------------------------------- --the good, the bad and the physi-- -----------------------------------
      Thanks. I tried that also, and the problem remains. On the up side, it is definately something to do with that '0' character.

      ____________________
      Jeremy
      I didn't believe in evil until I dated it.