in reply to Re: All These Files - Am I Thinking About This Right?
in thread All These Files - Am I Thinking About This Right?

"...Once you have the document text, there are various open-source search frameworks to use...

Elasticsearch has plugins such as fscrawler to deal with all that for you.

  • Comment on Re^2: All These Files - Am I Thinking About This Right?

Replies are listed 'Best First'.
Re^3: All These Files - Am I Thinking About This Right?
by bliako (Abbot) on Apr 04, 2019 at 13:34 UTC

    Cool then. It uses Tesseract too. Though, I do not know how effective an automated solution will be as opposed to manually tuning or re-training Tesseract for scanned, old documents.

      Nothing to stop you tweaking Tesseract to your heart's content.