in reply to Searching data file

I have rewritten your code with a simpler algorithm.
use strict; use Data::Dumper; my %records; # hash to store book info based on author my $data; while (<DATA>) { chomp; $data .= $_; if ($_ eq '</ref>') { # process what's in the buffer when we see the end tag my $rec = process_record($data); $records{$rec->{author}} = $rec; $data = ''; } } print print Dumper(\%records); sub process_record { my $rec = shift; my %col; ($col{author}) = $rec =~ m/<author>\s*([^<]*)(?=<)/g; ($col{year}) = $rec =~ m/<year>\s*([^<]*)(?=<)/g; ($col{source}) = $rec =~ m/<source>\s*([^<]*)(?=<)/g; ($col{id}) = $rec =~ m/<id>\s*([^<]*)(?=<)/g; ($col{title}) = $rec =~ m/<title>\s*([^<]*)(?=<)/g; my @keywords = $rec =~ m/<key>\s*([^<]*)(?=<)/g; $col{keywords} = \@keywords; return \%col; } __DATA__ <ref> <provnc> <aulist> <author> Bin Laden </aulist> <year>1990 <source> Cambridge University Press, Cambridge UK, 1st edition <id>1 <keywords> <key>terrorism <key>whatever </keywords> </provnc> <title> Terrorism </ref> <ref> <provnc> <aulist> <author> Sydney </aulist> <year>1990 <source> Cambridge University Press, Cambridge UK, 1st edition <id>1 <keywords> <key>nothing <key>whatever </keywords> </provnc> <title> Terrorism </ref>
And the output is as expected -
$VAR1 = { 'Bin Laden' => { 'title' => 'Terrorism', 'author' => 'Bin Laden', 'keywords' => [ 'terrorism', 'whatever' ], 'id' => '1', 'year' => '1990', 'source' => 'Cambridge University Press, Ca +mbridge UK, 1st edition ' }, 'Sydney' => { 'title' => 'Terrorism', 'author' => 'Sydney', 'keywords' => [ 'nothing', 'whatever' ], 'id' => '1', 'year' => '1990', 'source' => 'Cambridge University Press, Cambr +idge UK, 1st edition ' } };

Replies are listed 'Best First'.
Re: Re: Searching data file
by graff (Chancellor) on Nov 03, 2003 at 02:24 UTC
    Actually, I think there's a slight problem with this design. The markup structure makes it clear that it is meant to handle refs with multiple authors, and when there is such a ref entry, your "process_record" sub will only return the first author -- then this single author will be the basis for testing if the record matches the given search. So if the name being searched for happens to be the second author in a record, that record won't be returned.

    You would need the hash element for "author" be a reference to an array, and then search over the elements of that array, which makes it a lot more complicated than if you were reading a whole <ref>...</ref> element at each iteration (by setting $/ as I suggested above), and looking for $search anywhere within the  <authlist> element.

      Yes you are right, there is a problem that my code does not pick out multiple authors. I have omitted multiple authors for being lazy.

      Fixing the code is simple though, just modify the code slightly to read multiple authors (same as multiple keys).
      my @author = $rec =~ m/<key>\s*([^<]*)(?=<)/g; $col{author} = \@author;
      And how to store the returned hash structure by the subroutine needs to be revised too since there can be multiple authors. That should be a simple exercise.

        If you wanted to use the hash again in another program (ie. save the data as a hash in a file for future use) -- how do you get the data to be 'loaded' in the new program so that you could use the statement below? $myinfo = $records{'Sydney'}{'Title'}; I looked on the web and everyone is using the 'eval' function but there are no good examples.