in reply to Fetching files (downloading) from the Internet (extra characters, file handles, file::fetch)

One of two things is happening here. One could be a very simple fix. I didn't read closely enough to tell which is your problem for certain, as you're going about this in a way which suggests you want to code and debug it yourself rathewr than using existing tools anyway.

If you're on Windows or another OS with line-ending translation for text files, use binmode on non-text files. Writing binary files in text mode on OSes that discern between them is an easy and common mistake and it is easily fixed.

Alternately, it could be your handling of the HTTP protocol. The HTTP protocol (as with most other text-based application-level Internet protocols) specifies line endings in ASCII linefeed/carriage-return pairs for the protocol elements themselves. Certain media types also use this, although that varies based on the media type of the body. The authors of modules for Perl to deal with these things such as LWP, LWP::Simple, and WWW::Mechanize know this and handle it in their code. See RFC 1945 section 2 paragraph 2. HTML is one media type that wants cr/lf.

If you're going to roll your own solution for standardized protocols, you're going to have to do your own standards research besides your own coding and testing. If you still want to roll your own, that's great. If not, use what's provided. Either is a valid decision, but you should probably have a good reason for reinventing existing wheels. Just don't blame the tools because you didn't do the reading.

  • Comment on Re: Fetching files (downloading) from the Internet (extra characters, file handles, file::fetch)

Replies are listed 'Best First'.
Re^2: Fetching files (downloading) from the Internet (extra characters, file handles, file::fetch)
by Anonymous Monk on Nov 25, 2008 at 16:14 UTC
    Hey, thanks, I AM working on a windows machine and the binmode function fixed both my implementation of an http downloader and LWP. So it was a file handle problem. You are right about reinventing the wheel, but I wanted to spend some time on doing that to learn. So after about a day of doing it myself I feel that I have a fair understanding of what is happening in http and i am moving on to LWP. I looked at WWW-mechanize but could not find that exact module in the ppm window. Is there 2 differtent version of that module (One for UNIX and one for windows)?.
      I don't think there are two different WWW::Mechanize packages for Unix and Windows. There may not be a PPM for it for several reasons, though. Since LWP and LWP::Simple can do most of the same things and there's also Win32::IE::Mechanize that does the same things as WWW::Mechanize but using the IE engine it may be a lower priority to put in the PPM repositories.

      ActivePerl may be able to load it through CPAN instead. It might be available in the newer repositories ActiveState just announced with more packages. Strawberry Perl may be able to use it from CPAN if ActiveState can't. There are passing reports for WWW::Mechanize tests on Windows, so someone has it working on that platform in some fashion. Perhaps someone who does more work on the Windows platform could answer more authoritatively.