design advice, please.

SlushPump has asked for the wisdom of the Perl Monks concerning the following question:

I need to write something to assist when we have a problem on the network such as devices with a virus that need to be found and quarantined. We need to (quickly) determine (usually starting with an IP):

hostname(s) as recorded in several different DNS's
owner (from several flat files or .csv spreadsheet dumps of historical info)
status (is it pingable)
nbtstat -a info
location (from a merge with a qip subnet dump to a .csv spreadsheet)
hits for this ip address from a number of other files with historical info, etc.

Then if I have 30 ip addresses, I'll loop through the above to provide firefighting info to the troops that need it.

Perl seems to be the way to do this for me and I'm working on learning it now. (I am an experienced ksh scripter already, but a perl newb.)

I have several design questions and seek advice from the Monks before I proceed to shoot myself in the foot (overly).

For example:

For nslookups, should I use Net::DNS, Net::Patricia, nslookup, gethostbyaddr, or something else?
I need to merge "dot-quad" notation variables with regular "mask" notation data to lookup phyical locations. Should I use Netaddr::IP (or NetAddr::IP:Lite) to convert between notations? (platform is WinXP+Cygwin)
Should I attempt to merge using the Net::CIDR::Lite technique in Merge CIDRs, grepp grepp -- Perl version of grep, or an external (ksh) grep?
I need to merge by grepping/awking through other flat files- same question- grepp within perl, call ksh grep, other? The bigger question here is- what rules of thumb should I use to determine this info for myself?
Am I reinventing a wheel here? I've googled, supersearched, and CPANned and don't see anything obvious that would do these kinds of things.

If there are other RTFM's I should concentrate on, please advise, as I'm searching for enlightenment and willing to read first.

My guess is that, to an experienced Perl Person (PerlMonkey? PerlDiver?) this would be a straightforward thing... but at my stage of wisdumb I'd appreciate some guidance in staying on an appropriate path. (as I do need to get this working in days, not months!) Thanks in advance....

Comment on design advice, please.

Replies are listed 'Best First'.
Re: design advice, please. by shmem (Chancellor) on Apr 10, 2007 at 17:15 UTC
Some thoughts.. maximum hubris: do it all the perl way, use modules, and as portable as anybody could wish. But see next rule. maximum impatience: get it done. That means, if you are familiar with nslookup or dig, stick to them, parse their output - hey, there's nothing bad calling an external program specialized for the task. Mark code that needs (or could need) a revision in a transition towards the goal of the previous rule. maximum laziness: let others do the job. Grab appropriate modules for all tasks. How do you know they are appropriate? Eeek, you have to read their documentation, how bad. For maximum laziness, look at the modules' dependencies. Loading `Net::DNS` means loading 37 modules. While most are core, some have to be installed (and updated eventually!). Umm... work. But that's how it goes - to be lazy, you need time to spend. So, contradictions. These can be put in-line looking at other constraints. How much time have you got? You need to get familiar with perl, and with the modules. To get familiar with Net::DNS you need not read the 37 manual pages for the 37 modules loaded, but the main ones of the module. At the end, having read all of them is a big win. But just now? Then, Is portability required, and to what extent? Are you willing to upgrade / improve your script? Remember, as Aristotle says - and I agree - makeshifts last the longest, and you might write stinkin' crap instead of loading "a module or two" (heh.. 37 to INF :-) which sits around as a symbolic link: "that's the guy that wrote me! blame him!" As for using awk, grep, sed, ksh etc from inside perl - if you don't know how to do your tasks without them in pure perl, come back with the specific issue and ask; but using them from inside perl is mostly...silly :-) --shmem _($_=" "x(1<<5)."?\n".q·/)Oo. G°\ / /\_¯/(q / ---------------------------- \__(m.====·.(_("always off the crowd"))."· ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}	[reply] [d/l]
Re: design advice, please. by f00li5h (Chaplain) on Apr 10, 2007 at 16:39 UTC
We are all gurus, some are just not as far along as others. All the things you ask are easy if tackled one at a time, for example."location (from a merge with a qip subnet dump to a .csv spreadsheet)" Read more... (481 Bytes) If you are going to perform the lookup more than once per run of the script, you will want to stash the lookup info in a hash or similar... But once you've read the file, you could insert the data into a database, and slowly phase out support for the csv in favour of some small CGI scripts or a maypole application to allow folk to keep the data up to data. If you slowly port each data source (where you can) into a table in your database, and attach a simple frontend like maypole (which will nearly automatically do the whole Create/Review/Update/Delete thing for free), you will end up with a single point to go to when you want to find out about things. You are asking for many small things, so you can just solve each of them, one at a time, it's just like Tetris, really [] [][] [] `@_=qw; ask f00li5h to appear and remain for a moment of pretend better than a lifetime;;s;;@_[map hex,split'',B204316D8C2A4516DE];;y/05/os/&print;`	[reply] [d/l] [select]
Re: design advice, please. by Moron (Curate) on Apr 10, 2007 at 16:44 UTC
Some additional suggestions: nmap::scanner and nmap::parser look useful for security scanning given IP address. The stat() function returns info on a file. Hits by IP address can be best maintained in a storable, to maintain a persistant hash based on address => hits. Geo::IP2Location to locate an IP address geographically. -M Free your mind	[reply]
Re^2: design advice, please. by SlushPump (Initiate) on Apr 11, 2007 at 18:59 UTC
Thanks for your suggestions. Although Geo::IP2Location sounds interesting, the segments I need are internal to our Company, not public ip address spaces.... hence the merging with a segment inventory dump. We do use nmap, etc., and also Foundstone and a number of other tools. I'll rtfm further on storables and your other suggestions.... I appreciate the help.	[reply]
Re: design advice, please. by graff (Chancellor) on Apr 11, 2007 at 03:41 UTC
I'm not familiar with a lot of the data sources that you're referring to, but in my own experience, when the task is primarily a matter of integrating pieces of information from a diverse set of sources, the aspect of Perl that seems to help most and make things easiest (once it's understood) is data structures. Start with the perldsc manual: Perl Data Structures Cookbook. Get well-enough acquainted with the concept of references (scalar values used as "pointers" to other things) and the somewhat grotty syntax involved, then start playing with arrays of hashes, hashes of arrays, etc. Don't be shy -- you really can manage some pretty complex stuff, and once you get the hang of it, it can be surprisingly simple. The important thing to keep in mind as you go down that path is: know what sort of structure will make the most sense in terms of the desired output, and have that structure in mind (in fact, document that structure) before you start writing any code. Then look at each of the inputs you'll be using to populate the structure. As mentioned in previous replies, address each data source as its own little task (taken by itself, it will tend to be pretty simple), and the goal of each task is simply to grab the relevant chunks of data for your structure and stick them in. Once you've gone through all the inputs, reporting the final results will be astonishingly simple, because your data structure will have been designed specifically for that purpose! For example, if the goal is to list IP addresses that may have become "belligerent" from infection, your first input would be something that provides the list of active (potentially suspect) addresses; let's suppose these become hash keys. Then you might need two or three other forms of basic information for each address (hostname, location, etc) -- these become second-level hash keys in an HoH structure. Maybe there will be current and recent "snapshot" data (packets sent per unit time or whatever) -- that could be an array, so one of your sub-hash-keys holds an array (making it HoHoA where the second-level hash key is "packetcounts" or whatever). Finally, when all the information is in, all that's left to do is decide which hosts to flag as belligerent, based on assessments of the various inputs. Writing the decision logic here should be fairly easy, because all the data you need is organized into a coherent and appropriate structure. Along the way, you'll also appreciate getting familiar with the perl debugger (running the script with "perl -d"), and with Data::Dumper. BTW "grepp" was designed as a command-line utility to do just a few things that the standard unix "grep" doesn't do well or at all (the many nifty extras in perl regexes, anything other than "\n" as the input record separator, and flexible handling of non-ASCII characters in patterns and data). If you think you need that extra nifty-ness and flexibility, just work it into your own perl code -- don't use grepp directly -- and save yourself the expense (and probable grief) of launching an unnecessary sub-shell.	[reply]