in reply to Re^2: Question: Fast way to validate 600K websites
in thread Question: Fast way to validate 600K websites

Sounds plausible at first, but the time taken to read (most) head request contents, pales into insignificance with the time taken to make the connection and transmit it in the first place. That is, all you are saving by stopping reading early is avoiding the transfer of data from the local tcpip buffer stack into your own process memory.

The full content has already been transmitted. Your local system has already had to responded to the device interrupts. And the local tcpip buffers have already been allocated to accommodate it. Even if the remote server actually wrote the 200 OK as a separate write to the outgoing socket, the tcpip layer at that end will probably delay its transmission until it has enough to fill a standard transmission buffer full (1536 bytes or some such?).

So no, I seriously doubt that you'd save much time doing it this way except for the rare instances where the http server is running in the same box, or the content of the head request was in the order of 100s of kbytes.

Besides which, the major delays when doing this task serially are when the DNS lookup fails, or the server doesn't exist and you fall back on tcp timeouts before moving on. Saving reading a few bytes will be neither here nor there in comparison with network delays and timeouts.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."
  • Comment on Re^3: Question: Fast way to validate 600K websites