sam313 has asked for the wisdom of the Perl Monks concerning the following question:

I am very new to HTTP inside Perl. I have written a script to download a CSV file from my shopping cart website using an HTTP::Request GET. Everything works fine but there is a strange character at the end of each line that I cannot identify. I cannot just use chop on each line because it does not show up on every line. Here is my code, maybe someone can help me understand what this character is or how I can get rid of it.
$h = new HTTP::Headers Content_Base => '$url'; $h->authorization_basic($uname, $password); $request = HTTP::Request->new( 'GET', $url, $h ) ; $ua = LWP::UserAgent->new; $count = 0; $response = $ua->request($request); while (!$response->is_success) { writelog("Failed to GET '$url': ".$response->status_line); sleep(30); $count++; $count < 20 or die "Failed to GET '$url': ", $response->status_line; $response = $ua->request($request); } open (FH, '>orders.csv'); print FH $response->decoded_content; close FH;
I tried to identify the strange character with the following debug code:
open(FH,'orders.csv') or die "$!"; while($line = <FH>) { $character = chop($line); print "!!!! $character ****\n"; }
The output from the above debug code was:
****
****
What kind of character would cause the bangs to not be displayed?

Replies are listed 'Best First'.
Re: strange character with HTTP::Request GET
by GrandFather (Saint) on May 01, 2008 at 04:45 UTC

    Generally chop should be avoided and chomp used instead.

    ord is a better way to "see" strange characters:

    open(FH,'orders.csv') or die "$!"; while($line = <FH>) { my $character = chop($line); print ord ($character), "\n"; }

    Most likely you have fallen foul of network / local OS line end differences or code that doesn't appreciate such differences. The new line character is being treated as a line end character where really there is a cr/lf pair and the cr (carriage return) is being passed back as the last character on the line.


    Perl is environmentally friendly - it saves trees
      I'm sorry, this is as simple as using chomp instead of chop isn't it? Sorry for being brain dead tonight and thank you for your help!
        $\ is set to LF, even on Windows, so you'd need to change it to "\x0D\x0A" for chomp to work here.
      It would seem you are right. Thanks!! Is there a simple way to drop the CR?
Re: strange character with HTTP::Request GET
by ikegami (Patriarch) on May 01, 2008 at 04:27 UTC
    A CR could move the cursor to the start of the line. Why don't you redirect the output to a file and use a hex dumper (od on unix) to id the char?
      Thanks for the suggestion! I did a hex dump and found "0D 0D 0A" at the end of each line or like you said, "CR CR LF". I am not 100% sure how to do it off the top of my head, but I guess I can drop these extra characters.

        It's probably only CR LF. The LF was converted to CR LF by Perl since you didn't binmode(STDOUT).

        You could do:

        while (<$fh>) { s/\x0D\x0A\z//; ... }

        Or change $/:

        local $/ = "\x0D\x0A"; while (<$fh>) { chomp; ... }

        Or if you don't care about whitespace at the end of the line, the following works with CR CR LF and CR LF:

        while (<$fh>) { s/\s+$//; ... }
Re: strange character with HTTP::Request GET
by Anonymous Monk on May 01, 2008 at 19:49 UTC
    Thanks for all of the help!! Identifying the CR was key to solving this one. All I did was add "$line =~ s/\r|\n//g;" and my problem is solved. Thanks!