I use this script when I recursively (i.e., with wget(1)) web-suck a site and have hundreds (sometimes thousands, as in the case of rfc's or recipes or such things) of files named things like "1001.html" "1002.html" and so on. This little snippet takes every file in the current directory and writes it to a hash as a scalar. You can then take this hash and run searches on it (such as in the case of RFC's or looking for a recipe that contains tuna, mustard, and noodles). Useful, tiny, and even better, it works!

Your files are written to a file called 'brick'. Move that to wherever you're working on your neato searching script.

note: the warnings pragma is a 5.6ism.

#!/usr/bin/perl use warnings; use strict; use Carp; use Storable qw{ freeze }; use File::Slurp; my %html_files; opendir TD, '.' or croak $!; foreach my $file (readdir TD) { $html_files{$file} = read_file($file); } my $brick = freeze( \%html_files ); open OUT, ">brick" or croak $!; print OUT $brick; close OUT or carp $!; exit 0; # 'good' exit for the shell

Replies are listed 'Best First'.
Re: All files in dir to Storable.pm data
by TheoPetersen (Priest) on May 13, 2001 at 03:39 UTC
    If you're going to use File::Slurp, then why not get the directory entries with read_dir and write the file with write_file?
Re: All files in dir to Storable.pm data
by jepri (Parson) on May 13, 2001 at 16:27 UTC
    That's definately a cute hack, but probably only fully useful on systems that lack an rgrep (or a grep -rl ) to do the same thing.

    Update: I've been asked to post the syntax for rgrep so here it is:

    rgrep "the word I'm looking for" * grep -rl "the word I'm looking for" *

    Where * is the strating directory/dilenames to search through. Excecuting those commands will search every directory below the one you are currently in.

    ____________________
    Jeremy
    I didn't believe in evil until I dated it.

      Or for people without rgrep:

      find . -type f | xargs grep "whatever you want"

      Which I alias (using bash) to:

      alias superfind='find . -type f | xargs grep'
      For that mod 1970's feel.

      -ben

        You won't get the filename prefix if xargs happens to hand grep only one item though. Always throw an extra /dev/null in there, which will be opened and bypassed, but is enough to ensure more than one filename at all times:
        find . -type f -print | xargs grep "whatever you want" /dev/null

        -- Randal L. Schwartz, Perl hacker

        But to make sure you don't get screwed up by funny filenames, use:

        find . -type f -print0 | xargs -0r -iXX grep "word" XX

        I don't know how to work that into an alias though.

        ____________________
        Jeremy
        I didn't believe in evil until I dated it.

Re: All files in dir to Storable.pm data
by converter (Priest) on May 13, 2001 at 18:57 UTC

    If you want to use Perl 5.6.0isms and make sure that perl won't attempt to run the code with an earler version, you can always require version number; (see perlfunc's require entry for more information).

    The problem with the code you've listed is that it uses a pragmatic module that isn't available in versions prior to 5.6.0. If someone tries to run this code with an earlier version, perl will try to locate and evaluate warnings.pm before it tries to evaluate the version requirement, and will tell you it "can't locate warnings.pm in @INC..." then die.

    If you place the version requirement in a BEGIN block it will be evaluated before any attempt to find warnings.pm and perl will print a message explaining that at least 5.006 is required and then die.

    # note that versions prior to 5.6.0 require the numeric # representation of the version number and don't recognize # v5.6.0 or 5.6.0, for example. BEGIN { require 5.006_00 }

    Example:

    $ perl553 -e 'BEGIN{require 5.006_00}; use warnings;' Perl 5.006 required--this is only version 5.00503, stopped at -e line +1. BEGIN failed--compilation aborted at -e line 1.

    Update:

    require happens at run-time, after compilation, while use happens at compile-time. This is why perl tries to use the pragmatic module warnings.pm before require version number; is evaluated.

    Placing the require expression in a BEGIN block is one solution, but not the best. use module counts as a BEGIN block and is evaluated at compile-time, so the best solution here is probably:

    use 5.006_00;

Re: All files in dir to Storable.pm data
by gbarr (Monk) on Oct 03, 2001 at 01:56 UTC
    You call readdir, but you do not check that the name you get back is indeed a file. More specifically you want to skip . and ..

    read_dir from File::Slurp will do that for you. Also Storabe has a sub to write the data directly to a file

    use strict; use Storable; use File::Slurp; my %html_files = map { ($_, scalar read_file($_)) } read_dir("."); store(\%html_files, "brick");
Re: All files in dir to Storable.pm data
by Anonymous Monk on Oct 03, 2001 at 00:58 UTC
    I'm using Storable.pm as a kind of homegrown database for phone numbers and small bits of data. It works great and it's very portable which I like. But I'd like to store entire text files in a hash too. I've been reticent to do this because I thought it would be more efficient (in terms of memory) to save the PATH to a file, rather than the contents of a file. this:
    $essays{'Emerson'} = "/usr/foo/american_lit/emerson/";
    instead of this:
    $essays{'Emerson'} = "The American Scholar from Addresses, published as part of Nature; Addresses and Lectures by Ralph Waldo Emerson An Oration delivered before the Phi Beta Kappa Society, at Cambridge, +August 31, 1837 Mr. President and Gentlemen, I greet you on the re-commencement of our literary year. Our anniversa +ry is one of hope, and, perhaps, not enough of labor. We do not meet for games o +f strength or skill, for the recitation of histories, tragedies, and ode +s, like the ancient Greeks; for parliaments of love and poesy, like the Trouba +dours; nor for the advancement of science, like our contemporaries in the Bri +tish and European capitals. Thus far, our holiday has been simply a friendly si +gn of the survival of the love of letters amongst a people too busy to give to l +etters any more. As such, it is precious as the sign of an indestructible ins +tinct. announce, shall one day be the pole-star for a thousand years?... etc. +";
    This goes on for several pages. Obviously, the second way will create a bigger hash, but it's also much more portable as I just have to copy the file created by storable to have ALL the essays I've stored. This seems to be what you did in your post. And you mentioned putting the data from THOUSANDS of files into a stored hash. Can I do this with lock_store and lock_retrieve? Doesn't that mean that when you lock_retrieve that stored file you'll be reading EVERYTHING in memory? That can't be good, can it? The problem I'm having with the path solution is portability across different machines.