Arsenal has asked for the wisdom of the Perl Monks concerning the following question:

This has me puzzled. I'm cleaning up a subroutine a co-worker wrote some time ago. When I turned on use strict and the -w flag, I get the warning:
Use of uninitialized value at ./gethtml.pl line 16, <my_socket> chunk +1.
This is the offending subroutine (spaced out a bit to make it more readable):
#/usr/bin/perl -w #gethtml.pl : HTML fetch and return. 28 MAR 2000 rev 1 use Socket; use strict; sub gethtml { my $my_url = $_[0]; my ($my_host, $my_request, $my_html_return) = ''; ($my_host, $my_url) = ($my_url =~ m#(.*?)/(.*)#); $my_request = "GET \/$my_url HTTP/1.0\nAccept: */*\nHost:$my_host\nUse +r-Agent: WebMangle/1.0\n\n"; socket(my_socket, PF_INET, SOCK_STREAM, getprotobyname('tcp')) || retu +rn "Socket Error: $!"; connect(my_socket, sockaddr_in(80,inet_aton($my_host))) || return "Con +nect Error: $!"; send(my_socket, $my_request, 0x0); while(<my_socket>) {$my_html_return = $my_html_return . $_;} #above line is the offending one close my_socket || return "Close Error: $!"; return $my_html_return; } 1;
while admittedly trivial, I like my code warning free. Is it telling me that the my_socket filehandle isn't initialized ? how can I initialize this? (maybe I need more caffeine.. :P )

Replies are listed 'Best First'.
Re: Use of unitialized variable?
by athomason (Curate) on Aug 29, 2000 at 00:50 UTC
    I see a few things which concern me, though the perl interpreter is generally more forgiving of questionable syntax than Monks are. As a stylistic point, your filehandles should at least start with a capital, if not be all caps. As for the warning you're getting, be sure that your pattern match is actually succeeding, otherwise $my_host and $my_url won't contain anything and so will be uninitialized. If the URL passed in is malformed, all sorts of badness will drop out. Also, are you aware that your varable declaration my ($my_host, $my_request, $my_html_return) = ''; is only setting $my_host? If you want to initialize all the variables to a zero-length string (which perl does automatically, however, if you ignore the warning), you'll need to specify a full list of empty strings (i.e. ('', '', '')). As for the match itself, take a look at Ovid's Death to Dot Star! about why your match isn't all it could and should be. Finally, be aware that your routine is duplicating the functionality of the LWP module, which retrieves web pages quite reliably and has many more features than you would want to implement yourself. And it correctly handles real URLs :-).

    Update

    I actually ran your code, and ncw is correct: the warning is resulting from the lack of initialization for $my_html_return. You can either explicitly declare $my_html_return with the list of empty strings as above, or replace while(<my_socket>) {$my_html_return = $my_html_return . $_;} with while(<my_socket>) {$my_html_return .= $_;} as ncw suggests. But keep in mind you should generally initialize your variables just as a good coding practice.

Re: Use of unitialized variable?
by ncw (Friar) on Aug 29, 2000 at 00:49 UTC
    Actually it is complaining about $my_html_return - you didn't inititalise this variable. Replace it with
    $my_html_return .= $_;
    And you'll lose that warning. Perl is just telling you where the error is in the current file (in this case a socket) in the error message "Use of uninitialized value at ./gethtml.pl line 16, <my_socket> chunk +1." just in case that helps you debug the problem (which it does sometimes).

    The error is nothing to do with the socket.

Re: Use of unitialized variable?
by Arsenal (Novice) on Aug 29, 2000 at 02:47 UTC
    (note: I didn't write the initial code.) Thanks for the suggestions, and the fix for the warning. :) I'll probably re-write it all, the entire tool set (someday), which is used to index web pages, check broken links, etc. But that's too big of a project for the here and now. :( Just trying to fix the bugs, and memory leaks(!) at the moment. Unless someone knows of a web site crawler that will catch things like missing mouseovers, example:
    <script language=javascript><!-- if (document.images) { navigation = new makeimgarray(9) navigation_mo = new makeimgarray(9); navigation[1].src = "/pool/images/psyche//butt_index.gif"; navigation_mo[1].src = "/pool/images/psyche//butt_index.mo.gif"; ... </script>
    No link checker I've found will check for the /pool/images/psyche//butt_index.gif image, except for the spagetti I'm having to work with right now. :) Anyhow, here's the changes I made to the gethtml subroutine for your critique:
    #/usr/bin/perl -w #gethtml.pl : HTML fetch and return. 28 AUG 2000 use Socket; use strict; sub gethtml { my $my_url = ($_[0]); my $my_host=''; my $my_request=''; my $my_html_return=''; ($my_host, $my_url) = ($my_url =~ m#(.*?)/(.*)#); $my_request = "GET \/$my_url HTTP/1.0\nAccept: */*\nHost: $my_host\nUs +er-Agent: WebMangle/1.0\n\n"; socket(MY_SOCKET, PF_INET, SOCK_STREAM, getprotobyname('tcp')) || retu +rn "Error: $!"; connect(MY_SOCKET, sockaddr_in(80,inet_aton($my_host))) || return "Err +or: $!"; send(MY_SOCKET, $my_request, 0x0); while(<MY_SOCKET>) {$my_html_return .= $_;} close MY_SOCKET || return "Error: $!"; return $my_html_return; } 1;
      Actually, you will save yourself a lot of trouble by using LWP and HTTP::Request, like so:
      use strict; use LWP::UserAgent; use HTTP::Request::Common; print &getHTML('http://www.perlmonks.org'); sub getHTML($) { # one $ means one scalar arg - for us humans my $url = shift; #create a new agent and name it my $ua = new LWP::UserAgent; $ua->agent("Shmozilla/0.1 " . $ua->agent); #create new request my $req = new HTTP::Request GET => $url; #send the and try to get a response my $resp = $ua->request($req); return ($resp->is_success) ? $resp->content : "Error\n"; }