URAvgDeveloper101 has asked for the wisdom of the Perl Monks concerning the following question:

I am posting again for a problem I didn't quite get an answer to. I seem to have upset someone on this board since I guess I myself am not entirely sure how to define the unexpected output I am experiencing and our discussions spun into many directions. I know this is Perl-specific, this much I know. Here is the GENERAL overview of what I am trying to do:
OVERVIEW

1) C program calls a Perl script with "popen"
2) Said Perl script sends an HTTP GET request to a webserver
3) Webserver returns a plain text file to Perl script ,br> 4) Perl script prints contents returned from webserver, to STDOUT
5) Calling C program captures the output from STDOUT

THINGS I DO KNOW:
a) The webserver is functioning properly and I have explicitly encoded the content to be "text/plain". I have confirmed this by printing out the response header in Perl
b) The C program is properly calling the script. I tested this with a simple print command from within the Perl script; When I call the C program, it is able to capture the print command output from the script (e.g. if I issue "print $var" from Perl, I am able to get that INTO my C program).

PROBLEM
My C program cannot capture the output from my Perl script WHEN I am using an HTTP GET request. In other words, if I put the following lines in my Perl script:
my $variable = "Hello"; print $variable;

My C program will capture "Hello" and print it out if I want. But if I use the following lines, assuming I set the contents returned to "Hello" from the webserver:
my $ua = new LWP::UserAgent; my $response = $ua->get('http://SOMEURL'); my $decodedContent = $response->decoded_content(charset=>'none'); print $decodedContent;

It prints nothing/blank. I confirmed this by even doing the following:
my $ua = new LWP::UserAgent; my $response = $ua->get('http://SOMEURL'); my $decodedContent = $response->decoded_content(charset=>'none'); print "HELLO-$decodedContent-HELLO";


....and I my C program prints out "HELLO--HELLO" with nothing in between as HAS BEEN CODED. I tried printing out the above $decodedContent variable to a file. With that I see it, except it obviously is encoded UTF16BE because of the "@^H@^e@^l@^l@^l@^o" characters I see. I don't get this if I print a regular string variable that I created, to file.

I am assuming this is some kind of character encoding problem. HELP!!!

Replies are listed 'Best First'.
Re: Unexpected output from my PERL program. WHAT is my problem???
by moritz (Cardinal) on Nov 04, 2009 at 16:55 UTC
    There are two possibilities: your C program could have problems with the 0-bytes, or your perl program doesn't work in the environment you're putting it in.

    You can check your perl program by printing the output both to STDOUT and to a file while running inside the C program. If that writes the expected output to the file, it's not the different environment of the perl program, but rather the C program that's to blame.

    You can change the character encoding of the output with

    binmode STDOUT, ":encoding(UTF-8)";

    Maybe an output with less 0 bytes is less confusing for your C program :-)

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Unexpected output from my PERL program. WHAT is my problem???
by ikegami (Patriarch) on Nov 04, 2009 at 16:06 UTC

    my C program prints out "HELLO--HELLO" with nothing in between

    You seem to be implying that print extracted the first 6 characters and the last 6 characters of the string it was given. That's nonsense.

    If the string passed to the parent process and therefore print contains nothing between the dashes, it's because $decoded_content is empty, which means the response's body was empty for that request.

    You're going to insist the response wasn't empty, that it contained the UTF-16be document discussed elsewhere. And I agree. That means the test you said you ran is different than the one you said you ran. The problem is that you are treating the downloaded file as a NUL terminated string. That's a bug in your C program.

    I don't get this if I print a regular string variable that I created, to file.

    Good, that means there's nothing messing with your file handles. You are saving what you get.

    With that I see it, except it obviously is encoded UTF16BE because of the "@^H@^e@^l@^l@^l@^o" characters I see.

    I believe you mean UTF-16be. The case doesn't matter, but you missed the dash.

    So the only problem is that you want to extract some data from the file you are downloading. For starters, we need to know in what format is the file you are downloading, and in what encoding do you want the extracted data to be.

    Update: Updated response to first quote.

      Then the server returned no content in its response.

      Then how come if I call the Perl script by itself (on the command line and NOT from my C program), it prints "HELLO-Hello-HELLO" with no blank to stdout?

      So the only problem is that you want to extract some data from the file you are downloading. For starters, we need to know in what format is the file you are downloading, and in what encoding do you want the extracted data to be.

      I set the content-type explicitly in the webserver to return content as "text/plain". I am just trying to print out simple text and nothing else. I want the format to be latin1/iso-8859-1. I want to get the same behavior as if I simply issued a "print $var" command as in the original posting example. I'm getting two different behaviors that I don't think i should expect to get.

        Then how come if I call the Perl script by itself, it prints "HELLO-Hello-HELLO" with no blank to stdout?

        I've already updated my earlier post with an answer to that. ("You're going to insist ...")

        I am just trying to print out simple text and nothing else. I want the format to be latin1/iso-8859-1

        binmode(STDOUT, ':encoding(iso-8859-1)'); print $response->decode_content(default_charset => 'UTF-16be');