Foggy Bottoms has asked for the wisdom of the Perl Monks concerning the following question:

Hey monks ! I'd like to develop a really quick and useful search tool capable of looking for a file that has specific content (exactly like the Windows search tool - the script'd have to run under Windows-based systems). Hence I came up with 2 options : Thanks for your time and patience...
Heureux qui, comme Ulysse, a fait un beau voyage
Ou comme celui-là qui conquit la Toison,
Et puis est retourné plein d'usage et raison,
Vivre entre ses parents le reste de son âge!

J. du Bellay, poëte angevin

Replies are listed 'Best First'.
Re: Search tool
by tachyon (Chancellor) on Jul 02, 2003 at 13:54 UTC

    If you want FAST you don't want to automate the native windows search function. It is in a word woeful. First it recurses the directory tree for every search and second you can only AFAIK pass it a drive or list of dirves to search, thus is your target is C:\something\stuff_here\ you will search everything else on C:\ for no good reason.

    You can get a recursive search in a couple of lines with File::Find but if you want SPEED you recurse the tree periodically, store the results in a database sturcture and seacrh your DB to find your files. All you need to do is update the database periodically. This is the *nix approach of excellent tools like locate.

    Locate lives in the findutils GNU package and you can get a Win32 port of it from here amongst other places. For blinding speed you won't do a lot better. To be frank you will never use Win32 native search again. Get a port of grep while you are at it and then all you need to do is:

    # update the locate DB C:\>locate -u # find whatever something you want.... C:\>locate some | grep thing

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Although I love locate too, I don't see it as being very helpful to the OP as it only searches an index of file names, not the contents of the files.

      Keeping a searchable index of file contents isn't trivial (searching it is even less so) and the index itself could easily grow huge. Depending on how often files are updated, keeping the index from growing stale may be an issue, especially if searches must be able to find new data as soon as it is available.

      Recursing and searching file contents might really be best in his case. Getting a port of grep, as you suggested might well help though.

      -sauoq
      "My two cents aren't worth a dime.";
      

        If he wants to do content swish-e is hard to pass up. Native C indexing, stemming and all the goodies, with a perl API wrapper to format the results to boot so you can make it look like whatever you want using the language we love. There is a Win32 port for this too....

        cheers

        tachyon

        s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print