Traku has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am trying to write a program which will retrieve a file of any format from a website. I can retrieve it, but the file will not open properly (error: Can't determine type).

Apparently I get this, because when I retrieve the file, it has an HTTP header. I found the pattern to remove the header, but I am not sure as to how to put it down in code.

For example when dealing with a GIF, I need to copy everything starting with GIF89a. I am not sure how to tell PERL to ignore everything before it.

btw, if you cant tell, heh, I just started using PERL two days ago.

Thanks for the help in advance!

PS This is what I have for code so far,

$socket->send("GET $1 HTTP/1.1\n");
$socket->send("HOST: www.dilbert.com\n");
$socket->send("\n");

#get response

#$request = <$socket>;

open(FILE1, ">gift_test.gif");

# write to file

while ($request = <$socket> && ($request =~ (whatever reg_ex goes here))
{
print FILE1 $request;
}

close (FILE1);

Replies are listed 'Best First'.
Re: Help with Header stripping
by matija (Priest) on Apr 01, 2004 at 18:20 UTC
    You didn't say what method you are using to retrieve the file so far.

    Had you used LWP::Simple, you would get exactly the content you require like this:

    use LWP::Simple; my $val=get "http://some/url/somewhere"; open(OUT,">some_file_name") || die "Could not save to some_file_name" +$!\n"; binmode(OUT); # you only really need this sometimes. BStS. print OUT $val; close(OUT);
    If you need more complex queries you might need to use LWP::UserAgent, and you will find the exact value you require (i.e. content without the headers) in the content method of HTTP::Result object you will get back.
      Unfortunately I cannot use LWP. As the professor asked us not to. But I'll keep that in mind for next time!
        Ah, if it's homework, you should have told us that, and the parameters of that homework.

        Here is a hint for you: When you look at the communication with your webserver, you will first see the echo of your request (maybe that only happens with telnet, you need to check), terminated by a blank line.

        After that you see the headers sent by the server, terminated by a blank line.

        You do not need to know how the wanted content starts. All you need to know is how the unwanted content ends.

        There, that should be enough of a hint :-)