Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: how to resolve IP's in an HTTPd that doesn't resolve them?

by taint (Chaplain)
on Jun 13, 2018 at 20:12 UTC ( [id://1216588]=note: print w/replies, xml ) Need Help??


in reply to Re: how to resolve IP's in an HTTPd that doesn't resolve them?
in thread how to resolve IP's in an HTTPd that doesn't resolve them?

"Perhaps stating your overall objective would be handy here to get more appropriate feedback"

Hmm... judging by your answer; I do indeed need to better define my objective -- Sorry. :-(

What I'm hoping to ultimately achieve, is to have the current logging the HTTPd provides, return (resolve) the connecting IP addresses it currently dumps to the log(s). Maybe an example would be prudent here:

66.249.69.38 my.web.host - [12/Jun/2018:12:32:26 -0700] "GET / HTTP/1. +1" 200 3306 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKi +t/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
When what I'd really like to see, is the following:
crawl-66-249-69-38.googlebot.com my.web.host - [12/Jun/2018:12:32:26 - +0700] "GET / HTTP/1.1" 200 3306 "-" "Mozilla/5.0 (Windows NT 10.0; Wi +n64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 + Safari/537.36"
Seems that this should be possible. I could simply:
#!/bin/sh - cat /var/log/my.web.host-access-log | awk '{ print $1; }' | ...
or some such to feed the logs to a resolver. But I'm ideally looking for a way to process the log(s) (connections) in "real time". So that the logs have the correct access times. I can imagine filtering , or piping it. But am not sure if they're the only/best solutions. So here I ask. :-)

Thanks, stevieb, for taking the time to respond!

Evil is good, for without it, Good would have no value
¡λɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH

Replies are listed 'Best First'.
Re^3: how to resolve IP's in an HTTPd that doesn't resolve them?
by afoken (Chancellor) on Jun 13, 2018 at 20:58 UTC

    I could simply:

    #!/bin/sh - cat /var/log/my.web.host-access-log | awk '{ print $1; }' | ...

    or some such to feed the logs to a resolver. But I'm ideally looking for a way to process the log(s) (connections) in "real time". So that the logs have the correct access times. I can imagine filtering , or piping it.

    <beancounting>You don't need cat in that pipe, just let awk read directly from the logfile.</beancounting>

    Back on topic: Name resolving takes time, causes some extra load, and can fail. Hence web servers generally prefer not to resolve the remote address for performance reasons. However, you could simply log to a pipe instead of logging into a file. Apache comes with logresolve, which is intended to run offline, but you could also use it "live". It's a simple filter. It might be a little bit too simple-minded:

    To minimize impact on your nameserver, logresolve has its very own internal hash-table cache. This means that each IP number will only be looked up the first time it is found in the log file.

    In other words: logresolve completely ignores any TTLs and so your live log will contain nonsense after running for a while. It's not a bug, as logresolve is intended to run offline and only for a short time.

    Have a look at the daemontools. At least multilog is usable, it takes care of reliably logging, including rotating log files. There is no IP resolving program in daemontools, but djb also published djbdns, a modular DNS resolver. It contains dnsfilter that should do quite exactly what you want: Resolve an IP address at line start to a host name. You should perhaps install a local cache on the webserver. That way, DNS requests are cached by djb's dnscache, dnsfilter reads most responses from the local cache, and so, DNS requests become a lot less expensive.

    To recap: Install a local DNS cache. Then log to a pipe that writes into dnsfilter. dnsfilter then logs into multilog, which creates a nice set of log files.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Thanks for your elaborate reply, afoken !

      Timely (HOST) resolution is not a problem on my servers. In fact I wrote (finally finished) a little resolver in about 160 lines ( C source ). That'll turn a file of 255 IP addresses into HOTS name(s), in under ~1 second, on standard CPE. Even faster if given a fatter "pipe". It does so accurately. It is slower, of course, on slower connections, or on bad / unmanaged addresses. Tho I could add a time threshold to the resolver. I haven't bothered, as I only use it for post-processing.

      So, it would seem from your response; that you'd recommend using a pipe. If my intent is to process the (connecting) IP addresses in real-time. While I had hoped to avoid that. I guess I'm not terribly surprised.

      Speaking of the Apache HTTPd; it's interesting that Apache doesn't have, or choose the use of a pipe. As it happily logs resolved IP addresses to it's log(s), from all my experiences with it.

      Maybe I'd do well to give it's source a look over. For possible clues.

      Thanks again, afoken, for taking the time to reply!

      Evil is good, for without it, Good would have no value
      ¡λɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH

        it's interesting that Apache doesn't have, or choose the use of a pipe

        I expect Apache to simply open the log file in append mode. That should also work with a named pipe (a.k.a. FIFO). mknod /var/log/httpd/access.log p should be sufficient. Apache writes to that pipe, and a resolver program reads from the pipe.

        But Apache can do even better, see piped logs:

        CustomLog "|/usr/local/bin/name-resolver foo bar baz" common

        The shell can also be invoked, that should allow creating a second pipe for a rotating logger:

        CustomLog "|$/usr/local/bin/name-resolver foo bar | /usr/local/bin/mul +tilog t s1000000 /var/multilog/apache" common

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1216588]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-03-29 05:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found