I'm indexing PDFs for a quick search of all of our PDF documentation. I will be sticking them into one column, but I hope none of the columns will come close to the 5000 max because of the fact that I'm doing all this "elimination' of common words, and duplicate words. I may eventually even just limit it to like the first x number of words as I feel if you are looking for a specific document about say apples, the word apples is going to appear withing the first couple of paragraphs at least.
Do you have any other suggestions rather than going this route?
Ulitmatly I'm just indexing the PDFs so that I can repoint back to them later. PDF is a good format for storing massive amounts of documentation, I'm just providing the ability to search all of them at once.

In reply to Re^2: Creating Metadata from Text File by Trihedralguy
in thread Creating Metadata from Text File by Trihedralguy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.