explodec14 has asked for the wisdom of the Perl Monks concerning the following question:

Hi - Does anyone know how to get the matching strings which result by using PLucene search ? I've been started to use PLucene and I have managed to get only the docs id's so far, but i need also to get the occurrences of the string i search in the document. Any ideas ?
  • Comment on Get matching strings by searching with PLucene

Replies are listed 'Best First'.
Re: Get matching strings by searching with PLucene
by GrandFather (Saint) on Sep 05, 2010 at 23:28 UTC

    Any code? A small self contained test script would help us understand where you are having trouble.

    True laziness is hard work
      The code:
      #!/usr/bin/perl use warnings; #use strict; use Plucene::Document; use Plucene::Document::Field; use Plucene::Index::Writer; use Plucene::Analysis::SimpleAnalyzer; use Plucene::QueryParser; use Plucene::Search::IndexSearcher; use Data::Dumper; my $content = join("",<DATA>); my $doc = Plucene::Document->new; $doc->add(Plucene::Document::Field->Text("content", $content)); my $writer = Plucene::Index::Writer->new("my_index", Plucene::Analysis::SimpleAnalyzer->new(), 1); $writer->add_document($doc); undef $writer; # close my @docs; my $parser = Plucene::QueryParser->new({ analyzer => Plucene::Analysis::SimpleAnalyzer->new(), default => "text" # Default field for non-specified q +ueries }); my $query = $parser->parse('content:Craigslist appears to have surrend +ered'); my $searcher = Plucene::Search::IndexSearcher->new("my_index"); my $hc = Plucene::Search::HitCollector->new(collect => sub { my ($self, $doc, $score) = @_; push @docs, $searcher->doc($doc); }); $searcher->search_hc($query, $hc); print Dumper @docs;

      As you can see I print the docs array, but this array doesn't contain the matching strings

        Having glanced at the Java documentation mentioned in the module documentation (you should note that I haven't actually used the module) I don't think it likely that there is any such thing as a "matching string".

        So far as I can see the documents are indexed and the match happens against the index, not against strings as such. In fact even the match text you provide is manipulated before it is used so what is used for matching isn't actually the text you provide in any case. There may be no exact match between the match text you provide and and any of the text anywhere in the document. To that extent it doesn't make sense to return a matched string - there is none.

        True laziness is hard work