Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

A previous worker for our company set up our company address book as a text file, now I'm here to add a search add on to it allowing to search for any and all criterea. I usually work with simplistic databases such as DB_File, but never on a text file for something like this. I need to build a search which will scan through the text file and pull back all the information that matches. For example, if I searched the string "robertj" and given it matches the email address below, it would pull back the entire record of that person (or persons, given that it matches more than once). This includes name, number, etc.

I have no idea where to get started, again I've never used text files for searching like this before. All entries are separated by %%% which definitely helps determine which data fits where.

Can anyone offer advice how to do this? All I need to do is set it up so they can search ANYTHING and for all the records that match, pull it back to screen for them. Sounds simple and they refuse to let me recode the system for them :(

Name: Robert Johnson Phone: (555)555-1111 Fax: (555)555-1112 Address: 12345 Anywhere St. Email: robertjohnson@someisp.com Info: more info %%% Name: Robert Goor Phone: (555)555-1113 Fax: (555)555-1114 Address: 12345 Somewhere St. Email: robert@cs.someu.edu Info: Some more info %%%

Replies are listed 'Best First'.
Re: Scanning a text file
by Roy Johnson (Monsignor) on Apr 09, 2004 at 15:58 UTC
    You're going to want to read each complete record in so you can search and return it. Something like
    my $searchfor = shift; $/="%%%\n" open(IN, 'datafile.txt') or die "Could not open: $!\n"; while (<IN>) { print if /\Q$searchfor\E/i; }

    The PerlMonk tr/// Advocate
      my $searchfor = shift; $/="%%%\n";
      Works EXACTLY how I need. With a few modifications this will run perfectly, thank you! Do you think you could explain what $/="%%%\n"; is doing? I've never seen or used $/ before.
        In case the many replies using this approach have not made it clear, you can get more information about $/ from running "perldoc perlvar", and scanning down till you see this one described (with the long name "$INPUT_RECORD_SEPARATOR"). Normally -- i.e. by default -- $/ = "\n" (which in reality would be "\r\n" on MS-DOS/Windows systems, "\r" on "Classic" macs, and truly "\n" on unixes).

        It's the character pattern that is removed from the end of a string when you "chomp" the string, and it is the pattern that the diamond operator looks for when reading data from a file handle into a scalar, to know when to stop. If set to "undef", it causes a single read operation to absorb the entire file and assign it all to a single scalar.

Re: Scanning a text file
by davido (Cardinal) on Apr 09, 2004 at 15:59 UTC
    You might be able to start by setting the input record separator, $/ to %%%\n. Then you're at least working with one record at a time. From that point, split it into fields, and subsequently split the fields into keys and values. You can split into fields by splitting on /\n/, it seems. You can split into keys and values by splitting on /:\s/.

    Then check your search criteria against the appropriate fields. Your next step would be to push the entire record into an array if it matches all of the search criteria.

    This, of course, is just one possibility. If you have specific issues trying to implement something like this follow up here with code-related questions that we can sink our teeth into.


    Dave

Re: Scanning a text file
by jZed (Prior) on Apr 09, 2004 at 20:12 UTC
    One option is to use DBD::AnyData to access the file with DBI and SQL. The script below works using data from the DATA section of a file, but if you substitute in the name of a file containging similar data, it will work just as well.
    #!perl -w use strict; use DBI; my @cols = qw( Name Phone Fax Address Email Info ); my @table = (\@cols); for my $record(split /\s*%%%\n*/,join '',<DATA>) { my @new_rec; my @fields = split /\n/, $record; for my $field(@fields) { $field =~ s/^.*:\s*//; push @new_rec, $field; } push @table, \@new_rec; } my $dbh = DBI->connect('dbi:AnyData:'); $dbh->ad_import('t','ARRAY',\@table); my $sth = $dbh->prepare(" SELECT name,email FROM t WHERE address LIKE '%Anywhere%' "); $sth->execute; print $sth->dump_results; __DATA__ Name: Robert Johnson Phone: (555)555-1111 Fax: (555)555-1112 Address: 12345 Anywhere St. Email: robertjohnson@someisp.com Info: more info %%% Name: Robert Goor Phone: (555)555-1113 Fax: (555)555-1114 Address: 12345 Somewhere St. Email: robert@cs.someu.edu Info: Some more info %%%
Re: Scanning a text file
by hardburn (Abbot) on Apr 09, 2004 at 16:20 UTC

    This might not work as I expect it to, but . . .

    local $/ = '%%%'; # Assume 'FH' is input file, opened elswhere my @records; while(my $line = <FH>) { $line =~ s/%%%//; # Remove any beginning or trailing whitespace # NOTE: Using \A and \z instead of ^ and $ is critical $line =~ s/\A \s* (.*?) \s* \z/$1/x; my %rec = split /[:\n]/, $line; push @records, \%rec; }

    @records should now hold an array-of-hashes containing the name/value pairs of your data. This may fail for some data, such as if a ':' appears anywhere besides splitting the name/value pairs in the file. It depends on the fact that the flattend list comming out of split can be coerced into a hash.

    ----
    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

Re: Scanning a text file
by Steve_p (Priest) on Apr 09, 2004 at 16:13 UTC

    This is a problem that is screaming to be solved by a simple search engine. Check out Plucene and Plucene::Simple. There is also a nice tutorial on it at Perl.com. Yes, the article talks about a web-based search engine, but putting it into a script should be just as easy.

    In your situation, you'll need an initial parse on the file to create the indexes, but after that the lookups should be quick. It will also return the full record or records related to your search.

Re: Scanning a text file
by matija (Priest) on Apr 09, 2004 at 16:02 UTC
    This just cries for MySQL's fulltext search, but I suppose you won't be allowed to do that.

    How big is the file? If it's small enough (less than a megabyte, say), you could just slurp it in, make each whole record into a single array element, and simply search over them with

    @hits=grep(/$term/,@directory);

    And if it's more than a megabyte, do it with MySQL and just don't tell them :-)

    Update:Aagh, what was I thinking - of course there is no need to slurp the whole file into memory - unless you wanted to make a search server :-)