Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Fetching HTML Pages with Sockets

by zentara (Archbishop)
on Sep 20, 2004 at 13:29 UTC ( [id://392351]=note: print w/replies, xml ) Need Help??


in reply to Fetching HTML Pages with Sockets

Here is one, which is easy to understand. (I didn't write it, but it works fine).
#!/usr/bin/perl # Very simple client program to search for # regular expressions on specified Web sites. # require 5.002; use strict; use Socket; # Perl 5 technique for declaring local variables. my ( $host, $in_addr, $proto, $port, $addr ); my ( $response, $page, $file, $pattern, %urls ); # Set up some URLs and patterns in an array hash my @pages = ( "zentara.net/~zentara/poems.html", "zentara.net" ); foreach $page (@pages) { ( $host, $file ) = split /\//, $page, 2; # Form the HTTP server address from the host # name and port number $in_addr = ( gethostbyname($host) )[4]; $port = 80; $addr = sockaddr_in( $port, $in_addr ); $proto = getprotobyname('tcp'); # Create an Internet protocol socket. socket( S, AF_INET, SOCK_STREAM, $proto ) or die "socket:$!"; # Connect our socket to the server socket. connect( S, $addr ) or die "connect:$!"; # For fflush on socket file handle after every # write. select(S); $| = 1; select(STDOUT); # Send get request to server. print S "GET /$file HTTP/1.0\n\n"; print "===================$page===========================\n"; # Look for patterns in returned HTML. while (<S>) { foreach $page (@pages) { print; } } close(S); } exit;

I'm not really a human, but I play one on earth. flash japh

Replies are listed 'Best First'.
Re^2: Fetching HTML Pages with Sockets
by melora (Scribe) on Sep 20, 2004 at 14:24 UTC
    Thanks for posting that script. I've been experimenting with sockets, lately, but strictly in the realm of our lan. I had to give permission to the firewall to let me through, but once I did this worked nicely. Question: are there any security issues involved in fetching a page in this way? Just want to make sure whether I'm playing with fire, or just scrabbling in the dirt as I usually do.
      I can't think of any security issues that would arise from pulling files down in using a socket and HTTP directives, but keep in mind that if the sockets are not set up properly, you may leave ports open, so making sure that you close the sockets explicitly is always a good measure.

      Also be sure to run perl with the Taint option if you plan on using the output from a remote location as the input on your script.

      amt
      "are there any security issues involved in fetching a page in this way?"

      It shouldn't be anymore of a security issue than retreiving it with Mozilla, or any other browser. As a matter of fact, I would worry more about Mozilla than Perl.

      You have to learn how your firewall works. There is a difference between opening up a server on a port listening for connections, and using a port to receive from a connection which YOU initiated. It's called an 'established' connection. One which you initiate, then open a port as part of that established connection. Ftp works this way too. The next time you fetch a file thru http, with a conventional browser, type "socklist" (as root) and lookm at the sockets and ports opened up to receive it.


      I'm not really a human, but I play one on earth. flash japh

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://392351]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-26 00:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found