Re: Creating Metadata from Text File

You mention the "limit i think for an oracle varchar2 table is at least 5000 bytes"; I think you mean that a single Varchar2 column can hold about 4000 1 byte characters (less if using multibyte).

Does that mean you are going to store the entire word list you are generating in a single Oracle column on a table? That could be a very large table... and searching against a freeform column (which is why I assume you are putting this data into an Oracle table) is *extemely* slow.

What is the format for your metadata, ultimately? Are you attempting to encode RDF or RDFS or something else?

Comment on Re: Creating Metadata from Text File

Replies are listed 'Best First'.
Re^2: Creating Metadata from Text File by Trihedralguy (Pilgrim) on Jul 20, 2007 at 18:47 UTC
I'm indexing PDFs for a quick search of all of our PDF documentation. I will be sticking them into one column, but I hope none of the columns will come close to the 5000 max because of the fact that I'm doing all this "elimination' of common words, and duplicate words. I may eventually even just limit it to like the first x number of words as I feel if you are looking for a specific document about say apples, the word apples is going to appear withing the first couple of paragraphs at least. Do you have any other suggestions rather than going this route? Ulitmatly I'm just indexing the PDFs so that I can repoint back to them later. PDF is a good format for storing massive amounts of documentation, I'm just providing the ability to search all of them at once.	[reply]
Re^3: Creating Metadata from Text File by poqui (Deacon) on Jul 20, 2007 at 19:06 UTC
Yours sounds like an adequate "brute force" method; but if you have the time, you should take a look at RDF (Resource Desciption Format) which is the standard for metadata about documents and other things that a library might consider a "Resource"; its being extended to encompass other things as well; like code and databases; but it started right where you are at now. I suggest it because there are tools to search RDF for matching resources, based on subject and meaning, rather than just the appearance of certain words.	[reply]
Re^4: Creating Metadata from Text File by Trihedralguy (Pilgrim) on Jul 20, 2007 at 19:15 UTC
While I haven't gone looking quite yet, do you know if these other RDF solutions are perl driven. I'm trying to do it the "brute force" method because we need something quick, easy, and something that can be completely automated. I will have to at least look into this RDF you speak of.	[reply]
Re^5: Creating Metadata from Text File by poqui (Deacon) on Jul 20, 2007 at 19:23 UTC
Re^6: Creating Metadata from Text File by Trihedralguy (Pilgrim) on Jul 20, 2007 at 19:26 UTC
Some notes below your chosen depth have not been shown here


We don't bite newbies here... much
	PerlMonks