Hello and thank you for your time. Being new to perl I might pop off some offtopic or basic questions here i only hope you guys be patient:]

I've been toying around with building a basic webcrawler to learn, heres where im stuck: First I would need to access the Socket module :

Now I stumbled across two ways to do so:

use IO::Socket; use Socket;
What would be the difference?

Next, I'd need to create an actual socket:

socket(SOCKET, PF_INET, SOCK_STREAM, getprotobyname('tcp'))

Now, the functions parameters got me at loss.

SOCKET - This field contains a pointer to an already existing socket? Huh?

PF_INET - Could also be AF_INET. Either Address Family or protocol family. What would that mean? SOCK_STREAM - Going through some existing crawler code, I couldn't even locate where this stream came from. Is it there by default?

getprotobyname('tcp') - Either TCP, UDP or pure datagram socket type? I guess i get this one.

Than, in some places i've read that I would need to run bind(SOCKET,ADDRESS) to assign an ip to the socket.(My ip i guess), but in the example im working with this isnt included. Where is source IP assigned than?

Sometimes, I can see a socket created differently in perl, with use IO::Socket::INET; What those different types/modules of sockets would be used for? A socket created using the ::INET module also accepts Peer Address. Why is it implemented differently?

Now, we want to send data through our socket using the SEND function(is it called a function or a sub?) send(SOCKET,"GET http://google.com HTTP/1.0\n\n" Im left wondering, what about the TCP handshake? Seems we can just skip it and ask for the resource off the bat from the server. Is that always the case?

And to the final question, receiving a response we would need our server-socket running in listen mode with an accept on a while(always) loop. How do I implement code that would allow me to efficiently store the web page so that I scan it for potential links to other sites?

Waiting eagerly for your guidance and tips, Alex. Hey guys! Thanks for your replies, I will def take a look at stevens network programming and for the LWP, I indeed want to understand socket programming before I move to the highter level programming. So, if anyone here is a socketing pro, I would appreciate it if he cant go step by step with me here. Thanks!

In reply to some SOCKET action by Sary

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.