Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Apache solr vs Apache Lucy

by ppant (Acolyte)
on Sep 16, 2015 at 09:54 UTC ( [id://1142181]=perlquestion: print w/replies, xml ) Need Help??

ppant has asked for the wisdom of the Perl Monks concerning the following question:

I have a web application written in Perl. My search requirement is to index a file system / on fly document which can contain file types i.e.; HTML, MS Office, PDF documents etc and then perform a full-text search. I have already investigated Apache Solr works fine with sample data. Now I got to know about Apache Lucy and wondering if this is the right candidate for my Perl-based application. My concern with Apache Lucy this is that there is no update on CPAN after Dec 2014. Not sure if this is actively maintained especially what is the progress with integration with Lucene 5.3. I need suggestion on below points:
- Is Apache Lucy is similar to Apache Lucene, API's everything?
- Is Apache Lucy production ready?
- Any tentative planning of new release of Apache Lucy.

Replies are listed 'Best First'.
Re: Apache solr vs Apache Lucy
by dmitri (Priest) on Sep 16, 2015 at 15:44 UTC
    I've used Apache Lucy and its predecessor, KinoSearch, for over 10 years and at three different @jobs with great success. I recommend it with all my heart!

    • Fast
    • Stable
    • Flexible
    • Authors reply to emails for help

    Use Apache Lucy and be happy!

      Thanks dmitri for your valuable response. I just got to know that Lucy only provides sub-set of features Lucene provides. Would like to know if you know any critical features Lucy is lacking. I think my requirement is not complex. I need to index file system periodically.. need full-text search including HTML, DOC, XLS, PDF etc types. Thanks again.
        I've never used Lucene, so I cannot compare the two. I use Lucy to index PDF, HTML, DOC, and a several other document types. Converting them into text indexable by Lucy has to be done separately.

        I've graduated from reindexing once every few hours using cron job to using Linux::Inotify2 to provide practically instant updates to the index. Surely impressed my $boss...

Re: Apache solr vs Apache Lucy
by Corion (Patriarch) on Sep 16, 2015 at 10:26 UTC
Re: Apache solr vs Apache Lucy
by hippo (Bishop) on Sep 16, 2015 at 12:44 UTC
    My concern with Apache Lucy this is that there is no update on CPAN after Dec 2014.

    Not necessarily a bad thing - indicates stability and/or caution on new releases. It also probably puts it in the top 20% or so of most recently released dists (guessing).

    Anyway, the repository shows updates within the last week, so it is clearly being worked on.

      Thanks. Indeed, GitHub source repo has recent activities.
Re: Apache solr vs Apache Lucy
by Your Mother (Archbishop) on Sep 16, 2015 at 16:02 UTC

    What dmitri said. I've used KinoSearch/Lucy for almost as long and can testify that the devs are the best and I know that if you find a real bug, it will be addressed quickly because it happened to me. I've never used Search::Elasticsearch but it looks like a good thing to try as well and possibly easier to work with than Lucy (its API seems a bit higher level) but I can't imagine it's as fast.

      Thanks for your feedback. have done some trials on Search::Elasticserach too. Module is working fine and looks good but was facing difficulties in making json for attachment (HTML,PDF, DOC file stored in a file system) mapper through the module (not found enough documentation). Probably I am missing something? I am using elastic 1.7 but upcoming version 2 is getting a lot of changes mainly how they index filesystem (river is deprecaited in 2.0) so probably Perl module will also get some updates also documentation.
Re: Apache solr vs Apache Lucy
by Your Mother (Archbishop) on Oct 16, 2017 at 22:04 UTC

    Since this got revived I'll chime in. I'm still using KinoSearch, Lucy's dad, on 5.8 no less, because I'm stuck in upgrade Hell and don't have a cc/gcc new enough to compile Lucy yet. It's still working great, in production at hundreds of customer sites in front of tens of thousands of users, as it has for 6 or 7 years.

Re: Apache solr vs Apache Lucy
by isync (Hermit) on Oct 16, 2017 at 17:33 UTC
    I've used (and still am using) Lucy for many years since the KinoSearch days and can testify that it's one of the gems on CPAN. At least since it's rename to Lucy it's stable and perfectly production ready. Getting it to work is a bit tedious, as you have to fine tune / design every bit of your application, but once it's working, it's fast and consistently performant. I never did benchmarks, but I had a multi-million docs project once where many components had difficulties, except Lucy, which plowed through like a champ.
Re: Apache solr vs Apache Lucy
by Anonymous Monk on Sep 16, 2015 at 10:25 UTC
    Regarding release plans, its best to ask the devs, in their prefered mail list/irc/...
      Thanks. Will do so.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1142181]
Front-paged by Arunbear
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-21 01:24 GMT
Find Nodes?
    Voting Booth?

    No recent polls found