in reply to PDF Indexing / Search
I'd probably set up some type of conversion process (pdftotext) and then just grep those converted files.
the way too simplistic approach:
#!/usr/bin/perl use strict; use warnings; die "usage: $0 <pdffile> <searchterm>" unless @ARGV == 2; open( CMD, "/usr/bin/pdftotext $ARGV[0] - |" ) || die "cannot open $ARGV[0]: $!"; while( <CMD> ) { print if( /$ARGV[1]/ ); }
Update: If I was going to do this for real and it needed to be web available, I would probably use Solr.
|
|---|