I have come to a point in the development of an application where I need to bounce a few ideas off a few of my fellow monks - I am building an application which builds an index of certain web pages. I have at this stage tentatively decided upon using Berkeley DB 3.x as the basis for my data store (interfaced specifically via
BerkeleyDB::Hash) and am now trying to decide upon the best aspect of the web pages being indexed to store as the hash key.
The most direct method of course would be to use the escaped URL of the web page as the key (most likely that generated by URI::Escape), but I am wondering if there might exist a cleaner and more expansive (read, ordered) way to index such pages. I have also considered using a MD5 hash of either the URL or the page itself as the key for indexing, but this seems to be an overkill with the time involved in subsequently generating these MD5 hashes to perform a lookup. The onus here for ease and speed is not so much in the indexing but the subsequent matching and lookup of the data - It should be noted that subsequent lookup will be again derived from the location URL.
Should I stick with the idea of an escaped URL as the hash key or do other monks here have a more ordered approach that I can use to index this data?
Ooohhh, Rob no beer function well without!
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.