Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: how to resolve IP's in an HTTPd that doesn't resolve them?

by stevieb (Canon)
on Jun 13, 2018 at 19:46 UTC ( [id://1216584] : note . print w/replies, xml ) Need Help??


in reply to how to resolve IP's in an HTTPd that doesn't resolve them?

Why not share the names of the culprit software(s)? I mean, if they are open source and written in a language somebody else knows, perhaps they can see where a couple of C-type calls could be made to do what you want.

Otherwise, doing what you're doing will be fine; just run a DNS cache alongside your application/module, check/cache the result, then tee off to a custom log file as you said.

That, or write a log parser instead, that either rewrites the log file when you want to read it, or one that reads line-by-line and displays to the consumer after the transformation has occurred.

Perhaps stating your overall objective would be handy here to get more appropriate feedback.

  • Comment on Re: how to resolve IP's in an HTTPd that doesn't resolve them?

Replies are listed 'Best First'.
Re^2: how to resolve IP's in an HTTPd that doesn't resolve them?
by taint (Chaplain) on Jun 13, 2018 at 20:12 UTC
    "Perhaps stating your overall objective would be handy here to get more appropriate feedback"

    Hmm... judging by your answer; I do indeed need to better define my objective -- Sorry. :-(

    What I'm hoping to ultimately achieve, is to have the current logging the HTTPd provides, return (resolve) the connecting IP addresses it currently dumps to the log(s). Maybe an example would be prudent here:

    66.249.69.38 my.web.host - [12/Jun/2018:12:32:26 -0700] "GET / HTTP/1. +1" 200 3306 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKi +t/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
    When what I'd really like to see, is the following:
    crawl-66-249-69-38.googlebot.com my.web.host - [12/Jun/2018:12:32:26 - +0700] "GET / HTTP/1.1" 200 3306 "-" "Mozilla/5.0 (Windows NT 10.0; Wi +n64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 + Safari/537.36"
    Seems that this should be possible. I could simply:
    #!/bin/sh - cat /var/log/my.web.host-access-log | awk '{ print $1; }' | ...
    or some such to feed the logs to a resolver. But I'm ideally looking for a way to process the log(s) (connections) in "real time". So that the logs have the correct access times. I can imagine filtering , or piping it. But am not sure if they're the only/best solutions. So here I ask. :-)

    Thanks, stevieb, for taking the time to respond!

    Evil is good, for without it, Good would have no value
    λɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH

      I could simply:

      #!/bin/sh - cat /var/log/my.web.host-access-log | awk '{ print $1; }' | ...

      or some such to feed the logs to a resolver. But I'm ideally looking for a way to process the log(s) (connections) in "real time". So that the logs have the correct access times. I can imagine filtering , or piping it.

      <beancounting>You don't need cat in that pipe, just let awk read directly from the logfile.</beancounting>

      Back on topic: Name resolving takes time, causes some extra load, and can fail. Hence web servers generally prefer not to resolve the remote address for performance reasons. However, you could simply log to a pipe instead of logging into a file. Apache comes with logresolve, which is intended to run offline, but you could also use it "live". It's a simple filter. It might be a little bit too simple-minded:

      To minimize impact on your nameserver, logresolve has its very own internal hash-table cache. This means that each IP number will only be looked up the first time it is found in the log file.

      In other words: logresolve completely ignores any TTLs and so your live log will contain nonsense after running for a while. It's not a bug, as logresolve is intended to run offline and only for a short time.

      Have a look at the daemontools. At least multilog is usable, it takes care of reliably logging, including rotating log files. There is no IP resolving program in daemontools, but djb also published djbdns, a modular DNS resolver. It contains dnsfilter that should do quite exactly what you want: Resolve an IP address at line start to a host name. You should perhaps install a local cache on the webserver. That way, DNS requests are cached by djb's dnscache, dnsfilter reads most responses from the local cache, and so, DNS requests become a lot less expensive.

      To recap: Install a local DNS cache. Then log to a pipe that writes into dnsfilter. dnsfilter then logs into multilog, which creates a nice set of log files.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Thanks for your elaborate reply, afoken !

        Timely (HOST) resolution is not a problem on my servers. In fact I wrote (finally finished) a little resolver in about 160 lines ( C source ). That'll turn a file of 255 IP addresses into HOTS name(s), in under ~1 second, on standard CPE. Even faster if given a fatter "pipe". It does so accurately. It is slower, of course, on slower connections, or on bad / unmanaged addresses. Tho I could add a time threshold to the resolver. I haven't bothered, as I only use it for post-processing.

        So, it would seem from your response; that you'd recommend using a pipe. If my intent is to process the (connecting) IP addresses in real-time. While I had hoped to avoid that. I guess I'm not terribly surprised.

        Speaking of the Apache HTTPd; it's interesting that Apache doesn't have, or choose the use of a pipe. As it happily logs resolved IP addresses to it's log(s), from all my experiences with it.

        Maybe I'd do well to give it's source a look over. For possible clues.

        Thanks again, afoken, for taking the time to reply!

        Evil is good, for without it, Good would have no value
        λɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH