Poetic Justice has asked for the wisdom of the Perl Monks concerning the following question:

I'm on my home network and I'm looking for the latest copy of my resume. In the process of doing this I realize that I've done this maybe 4 or 5 times in the last six weeks. Aha! I have a task that needs automation. Just a few minutes prior, I went to google and searched for the latest release of a particular utility I like to use. I found the answer on google in just a few seconds. To find my latest resume took a little less than a minute. My home network is primarily Win32 with 1 Linux machine I use for programming. I've got the tools I need to build a file indexing system that can help me find duplicate files, delete the duplicate files, copy important files to a backup directory for archiving purposes. I have a possibly big project that is ultimately suited to Perl. So before I go out and start re-inventing the wheel, is there anyone in the Monastary that has done a similar project? I'm hitting the books right now looking for possible situations. Any suggestions? Thanks Poetic Justice

Replies are listed 'Best First'.
RE (tilly) 1: Searching my network
by tilly (Archbishop) on Nov 15, 2000 at 17:00 UTC
    On Linux I would use the "locate" command for this.

    I believe that you will get this as part of Cygwin.

    A lot of people are going to recommend File::Find. What it provides is a way to program the same brute-force search that you already found too slow. If you want to roll your own, this is not going to be very fast. However if you have a regular program run, search your drive, and save it as a simple text file that you grep, you will find a massive speed increase. You can get much better speed increases by using well-structured indexing, tying hashs to a dbm, etc, but simple text is a much easier place to start and is probably fast enough.

    Even if you don't have grep, rolling your own is easy:

    perl -ne "print if /hello/i" *.txt
    (On Linux switch the type of quote.)

    Or take a look at the PPT project.

    EDIT
    I realized that my comment about File::Find may confuse. What I meant is that you don't want to run it interactively. However you may want to run it every day or so, then interactively read its output. (This is how the locate tool I mentioned above works.)

(jcwren) RE: Searching my network
by jcwren (Prior) on Nov 15, 2000 at 18:47 UTC
    While not applicable for general file searches and static files, projects or documents that you author are best stored in a version control system, such as CVS or SourceSafe (No flames!). That way, not only do you have a structured storage of the document, but you have revision control. If you leave two copies laying around, you don't have to spend time worrying about which one is the later version (date/time stamps are not *always* indicative).

    And, a resume is a good place to start to learn how to use those tools, rather than in the middle of a large project that just got out of hand.

    I spent many many days backing projects written 10 years ago into SourceSafe. It can be difficult to get in to the habit of using version control, but once you start, it's pretty addicitive. I typically use SourceSafe, so I can't comment on this for CVS, but with SourceSafe projects, they're stored in a directory away from the current project. It makes it very easy to cut these to CD on a regular basis.

    --Chris

    e-mail jcwren
      I'm looking right now at at buying SourceSafe. I havent' found another CVS alternative available for Win32. I've used it at different companies for source code management and I've only recently (the last 2 years) accumulated enough documents and source on my home system. Thanks for the recommendation.

      Greg
        Actually, CVS has been ported to Win32... I think you can find it at this page.
RE: Searching my network
by curtisb (Monk) on Nov 15, 2000 at 12:30 UTC
    You may want to use the File::Find module. There are plenty of examples by merlyn and Fastolfe. Then there is the long way of doing this as well, but I don't really recommend that....
    curtisb -- "Use the Wisdom of Others!"
RE: Searching my network
by elwarren (Priest) on Nov 15, 2000 at 20:03 UTC
    You could just keep your resume in the same spot and look for it there when you need it :-)

    But really, a locate command that spans multiple machines is a good idea. It would come in handy for me. Pretty simple idea, take the locate program and add an additional field to show which machine the file lives on. The hard part is going to be deciding how you want to store that info. The searches will need to run on multiple machines, so you'll either need a daemon running on each, or else code a way to login and scan or maybe scan only the available shares.

    Then you have to decide whether you want to propagate that info to each machine so you can run a search from any machine, or you could store all the results in a single database on one machine. Maybe put a cgi front end on it so you could use a browser on any machine and query for your file.

    Now you're going to a known location to find something. It may just be easier to put your resume in one location and go back to look for it there when you need it :-)
RE: Searching my network
by Albannach (Monsignor) on Nov 15, 2000 at 19:59 UTC
    For another option you might look into ICE which is 100% Perl. I used it with very minor mods to index our Novell LAN here a few months back for a specific purpose, but I'm not running it often and there may well be better options...

    Update: I should clarify that ICE doesn't just find files, but builds a comprehensive keyword index (subject to configuration options) which is necessary in my case (a large consulting firm) as there are literally hundreds of similar resumes, proposals and associated documents, many almost alike... otherwise I think elwarren gets the prize for simplest solution!

RE: Searching my network
by AgentM (Curate) on Nov 15, 2000 at 21:49 UTC
    A decent modern filesystem already supports journaling (shame on you ext2). Why not try reiserfs- the locate command will only return OLDER matching files (since the last time updated ran which is pretty much a hashing of File::Find). So, in terms of effiiciency, using locate is just the same as using File::Find if you use it ONCE- searching the same thing is the only optimized use of this function.

    In your case, searching with locate is not really all that more efficient than File::Find since the updatedb (that's what it's named) will be updated by default every day (at midnight or somethin') which runs a File::Find type directory tree walker anyway and hashes filenames in a DB. In this light, I strongly recommend ReiserFS not just to Poetic Justice who needs a search every week, but to everybody. In this case File::Find becomes obsolete! (There is a filesystem level call for file searching.) ReiserFS arranges the filenames in a tree which is easily searched. You can only benefit from this upgrade. This simple and logical FS tweak makes even such things as open(FILE,'</dir/dir2/dir3/file'); potentially faster! As soon as I hear the words "file search", I must say- REISERFS!

    AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.
RE: Searching my network
by Poetic Justice (Monk) on Nov 16, 2000 at 02:22 UTC
    Thanks to everyone for their input. I'm at the office right now so I'll just say, I'll keep you abreast of what I'm going to do.

    This is why I love the monastary.
    Poetic Justice