Perrin writes:
But ultimately, my advice would be to change the Swish-e index so that it can tell you not only what document the word is in but where in the document it was found. Then you can avoid doing this expensive parsing at request time.

Perrin,

That's really hard, though. First, swish keeps track of word position for phrase matches. But, all sorts of things will bump the position counter, special chars, some html tags, and so on. Trying to match swish-e's position data with what I could parse would be hard. It's hard enought matching up the text. So if swish told me to highlight word 243, it would be very lucky if I knew what that word was.

The other problem is that you can imagine the volume of data that might be returned for a wildcard search like s*. Tens of thousand word positions for a few hundred results.

But, probably my solution, if possible is to have swish store the source document, and with each word the character offset. Then for each word hit return the character offsets. Argh. I can see where phrases would be tough, too.

Right about /o in the regexp. See my comments (and I guess confusion) in my example code...

thanks,


In reply to Re: Re: Context search term highlighting - Perl is too slow by moseley
in thread Context search term highlighting - Perl is too slow by moseley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.