in reply to Re^6: Cannot access HTTP::Response content properly
in thread Cannot access HTTP::Response content properly

There are two kinds of encoding at play here: Transfer Encoding and Character Encoding.

Transfer encoding allows the content to be compressed during transit among other things. You want the actual content, not the temporary version used for transit, which is why you want to use ->decoded_content. ->content returns the representation of the content during transit, something that's useless on its own.

Character encoding is what allows characters to be represented as bytes. For example, the character encoding US-ASCII associates byte 0x41 with character LATIN CAPITAL LETTER A. The same character is associated with bytes 00 41 using encoding UTF-16be.

In files, characters can only appear in their encoded form. Internally, the same is true for memory, but Perl allows you to work with characters directly instead of the underlying bytes that form it.

That means that you can decode 00 41 to A, but you need to encode A back to into bytes if you want to save it to disk. (print expects bytes, not characters, unless you told it what to do with characters by using binmode :encoding.)

->decoded_content will also decode the character encoding for you if the web server specifies the content is some kind of text (incl HTML and XML) and specifies its character encoding. That can actually be bad, so you can disable that feature by specifying charset => 'none',


You seem to have assumed that the being encoded using UTF-16be is a problem. It's not necessarily, and trying to "fix" it could actually break it. For example, if the file is an XML document, it's not safe to change it's encoding since the document's encoding is specified in the document. The same is often the case for HTML documents as well.

Some background info on what you are trying to do would be useful.

Replies are listed 'Best First'.
Re^8: Cannot access HTTP::Response content properly
by URAvgDeveloper101 (Novice) on Nov 03, 2009 at 19:48 UTC

    Here is the background info you requested. I am basically trying to call a Perl script from within a C program. However, I don't want to just make a system() call. I want to be able to capture the output (stdout) from the Perl script. If I run the Perl script on it's own, I get the output displayed to stdout. If I call it from the C program I do not. At first I thought this was a problem with my C program, but I am pretty sure it is not. Why? Because if I assign a random string (aka "Hello World") to a variable in that same Perl script and execute a print command from inside it, THEN my C program is able to capture the output from stdout. But if I use the "print $res->decode_content" command it does not. Just for clarification, here is a portion of the C program:

    char command[] = "/home/user/scripts/PerlScript.pl"; . . fp = popen(command, "r"); buffer = (char *)malloc(sizeof(char) * bufSize); while( fgets(buffer, bufSize, fp) != NULL) fputs(buffer, stdout); pclose(fp); free(buffer);

    The above program calls the Perl script I have written that is in question. Again, if I do the following it prints when called from the C program:

    my $randomVar = "Hello World\n"; print $randomVar;

    But if I do this, it will NOT print from the C program (unless I call the Perl script by itself from the command line):

    my $res = $ua->get('http://SOMEURLHERE'); print $res->decoded_content;

      What are you talking about??? We were talking about downloading and encoding problems. Did you post in the wrong place?

        Sorry for the confusion. If you go back to the very beginning of this thread, you will see that my last explanation is why we are at this point. I was trying to properly see the output of the Perl script. I couldn't, so I tested it out by writing the output to a file from within the Perl script. I thought it didn't work at first. However, I discovered that it did except with all kinds of character encoding. I guess what is happening, is the encoding issue you are talking about is somehow interfering with my C program's ability to capture and/or display the output from the Perl script of which I am calling. In a nutshell I want to:
        1) call my perl script from my C program, 2) Perl script then issues a HTTP GET request, returns and prints out result, 3) C program captures this output that was printed to stdout

        I have steps 1 and 2 working. However, thanks to you, I have discovered that I have some kind of hurdle here with the character encoding that is preventing #3 and I don't know what to do now.