I've never seen someone write this before but that may not indicate much. Mostly I don't think this is the sort of thing a Domino developer would care about and so there isn't likely to be lots of code around to extract the contents of documents via the HTTP service. Actually, now that I think on this further there's a better way.

Use the normal Domino APIs to extract the data without even bothering with the web service. You're going a very twisty way when you could either go get a RichText object, translate it to text (with optional formatting) and index that. Another idea is to just use the normal FullText index. It sounds like you are trying to avoid using Domino's pre-existing indexing service. Why? You'd have to be crazy to give it up - it already works really darn well.


As for your current problem - I'd transform the HTML to XML with XML::LibXML and just use some XPath to fetch the expansion URLs. The in-place editing might be somewhat ugly so now I wonder if providing a more expansive URL command like ExpandAll might not be a better idea anyway.


In reply to Re: Domino Twisty Expansion using Lightweight Proxy by diotalevi
in thread Domino Twisty Expansion using Lightweight Proxy by inman

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.