Searching my network

Poetic Justice has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
RE (tilly) 1: Searching my network by tilly (Archbishop) on Nov 15, 2000 at 17:00 UTC
On Linux I would use the "locate" command for this. I believe that you will get this as part of Cygwin. A lot of people are going to recommend File::Find. What it provides is a way to program the same brute-force search that you already found too slow. If you want to roll your own, this is not going to be very fast. However if you have a regular program run, search your drive, and save it as a simple text file that you grep, you will find a massive speed increase. You can get much better speed increases by using well-structured indexing, tying hashs to a dbm, etc, but simple text is a much easier place to start and is probably fast enough. Even if you don't have grep, rolling your own is easy: `perl -ne "print if /hello/i" .txt` [download] (On Linux switch the type of quote.) Or take a look at the PPT project. EDIT* I realized that my comment about File::Find may confuse. What I meant is that you don't want to run it interactively. However you may want to run it every day or so, then interactively read its output. (This is how the locate tool I mentioned above works.)	[reply] [d/l]
(jcwren) RE: Searching my network by jcwren (Prior) on Nov 15, 2000 at 18:47 UTC
While not applicable for general file searches and static files, projects or documents that you author are best stored in a version control system, such as CVS or SourceSafe (No flames!). That way, not only do you have a structured storage of the document, but you have revision control. If you leave two copies laying around, you don't have to spend time worrying about which one is the later version (date/time stamps are not always indicative). And, a resume is a good place to start to learn how to use those tools, rather than in the middle of a large project that just got out of hand. I spent many many days backing projects written 10 years ago into SourceSafe. It can be difficult to get in to the habit of using version control, but once you start, it's pretty addicitive. I typically use SourceSafe, so I can't comment on this for CVS, but with SourceSafe projects, they're stored in a directory away from the current project. It makes it very easy to cut these to CD on a regular basis. --Chris e-mail jcwren	[reply]
Re: (jcwren) RE: Searching my network by Poetic Justice (Monk) on Nov 20, 2000 at 04:40 UTC
I'm looking right now at at buying SourceSafe. I havent' found another CVS alternative available for Win32. I've used it at different companies for source code management and I've only recently (the last 2 years) accumulated enough documents and source on my home system. Thanks for the recommendation. Greg	[reply]
Re: Re: (jcwren) RE: Searching my network by premchai21 (Curate) on Mar 08, 2001 at 23:00 UTC
Actually, CVS has been ported to Win32... I think you can find it at this page.	[reply]
RE: Searching my network by curtisb (Monk) on Nov 15, 2000 at 12:30 UTC
You may want to use the File::Find module. There are plenty of examples by merlyn and Fastolfe. Then there is the long way of doing this as well, but I don't really recommend that.... curtisb -- "Use the Wisdom of Others!"	[reply]
RE: Searching my network by elwarren (Priest) on Nov 15, 2000 at 20:03 UTC
You could just keep your resume in the same spot and look for it there when you need it :-) But really, a locate command that spans multiple machines is a good idea. It would come in handy for me. Pretty simple idea, take the locate program and add an additional field to show which machine the file lives on. The hard part is going to be deciding how you want to store that info. The searches will need to run on multiple machines, so you'll either need a daemon running on each, or else code a way to login and scan or maybe scan only the available shares. Then you have to decide whether you want to propagate that info to each machine so you can run a search from any machine, or you could store all the results in a single database on one machine. Maybe put a cgi front end on it so you could use a browser on any machine and query for your file. Now you're going to a known location to find something. It may just be easier to put your resume in one location and go back to look for it there when you need it :-)	[reply]
RE: Searching my network by Albannach (Monsignor) on Nov 15, 2000 at 19:59 UTC
For another option you might look into ICE which is 100% Perl. I used it with very minor mods to index our Novell LAN here a few months back for a specific purpose, but I'm not running it often and there may well be better options... Update: I should clarify that ICE doesn't just find files, but builds a comprehensive keyword index (subject to configuration options) which is necessary in my case (a large consulting firm) as there are literally hundreds of similar resumes, proposals and associated documents, many almost alike... otherwise I think elwarren gets the prize for simplest solution!	[reply]
RE: Searching my network by AgentM (Curate) on Nov 15, 2000 at 21:49 UTC
A decent modern filesystem already supports journaling (shame on you ext2). Why not try reiserfs- the locate command will only return OLDER matching files (since the last time updated ran which is pretty much a hashing of File::Find). So, in terms of effiiciency, using locate is just the same as using File::Find if you use it ONCE- searching the same thing is the only optimized use of this function. In your case, searching with locate is not really all that more efficient than File::Find since the updatedb (that's what it's named) will be updated by default every day (at midnight or somethin') which runs a File::Find type directory tree walker anyway and hashes filenames in a DB. In this light, I strongly recommend ReiserFS not just to Poetic Justice who needs a search every week, but to everybody. In this case File::Find becomes obsolete! (There is a filesystem level call for file searching.) ReiserFS arranges the filenames in a tree which is easily searched. You can only benefit from this upgrade. This simple and logical FS tweak makes even such things as open(FILE,'</dir/dir2/dir3/file'); potentially faster! As soon as I hear the words "file search", I must say- REISERFS! AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.	[reply]
RE: Searching my network by Poetic Justice (Monk) on Nov 16, 2000 at 02:22 UTC
Thanks to everyone for their input. I'm at the office right now so I'll just say, I'll keep you abreast of what I'm going to do. This is why I love the monastary. Poetic Justice	[reply]

AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.