I have been trying to find a way to download files off the internet given the URL and have run into 2 odd (at least to me) perl quirks along the way. I was hoping that a monk might be able to enlighten me as to what is going on here.

First, I tried to use the socket package and manually connect to the server and initiate a http session. Then, issue a GET command and download the file to what appears to be a file handle. From here I put the file as a string into a scalar variable and finally printed the string to a file through a second file handle.

The problem here was that I found extra chars in the downloaded file. They seem to be Hex 0D characters that are ASCII carriage returns. After doing some trouble shooting and not getting very far, I ran some code as an experement to determine if the problem was that I was not aware of something in the http or if the problem was in my use of the file handles. The code is below:

open (INDAT, "<pic.jpg") || die "Error - unable to open input file $! +"; open (OUTDAT, ">copypic.jpg") || die "Error - unable to open output fi +le $!"; while (<INDAT>){ print OUTDAT $_; } close(INDAT); close(OUTDAT);

In this code, I basically open a jpeg file read it from one file handle and write it it a second. Just as i had done in my downloading code. This seems to introduce extra characters just as my downloading code did. So, what is going on? Does it have anything to do with the non-text characters in the jpeg? Also, I tried slurping the file, but still had problems with extra characters in the output file. Also, at this point, I am wondering if I really don't know what is going on with reading and writing using file handles.

Second, after not having complete success with this approach I went to LWP package and the file:: fetch package. I had similar problems with the LWP package and finally had success downloading with the file::fetch package. The only problem with the file::fetch solution is when I tried to retreive a file from the root of a domain.

ex. http://www.somesite.com/

This should return the index.html page but it does not. The file::fetch package returns an error and will not get anything. So to fool it I added a space to the end of the string used to create the file fetch object. This actually works, but... When I try to print out the file using the ff->fetch subroutine, I get a file in the desired directory but it has just a dash and a number for a name and the code crashes when the file fetch object tried to change its name to space (As it should). So I was wondering is there a way to get the file fetch object to just get the default html file from the / directory of the domain with out tricking it, to have it write the file to a filename that I specify and is not 'space', or can I access the file as a string and write it to the HD myself (This would also be useful so I could analyze the file without putting it on the HD at all). I have already tried to specify a path that includes a file name but this just creates a folder with the filename as its name and puts the file in it.

Thank you for any help that you can give.


In reply to Fetching files (downloading) from the Internet (extra characters, file handles, file::fetch) by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.