Bonjour,

I have been piecing together a Perl/javascript (my question is only on the Perl guts side of things) based UI to Text::Aspell. I am presented with a completed HTML document and tasked with spell checking all of the words (anything between > and <, effectively).

My problem is finding only the words (nothing inside <>'s) and associated byte position in the document. Once I have this, I can relatively easily perform my JS visual transformations on the HTML and then post back the appropriate info to do the actual replacement in Perl.

What I'm struggling with is the regex to use. I've mucked around with

$-[0]
and
$+[0]
, but am now leaning towards a single s/.../function()/eg regex where the function does the dirty work of building the HTML I need to replace a spell checkable word with (just some nonesense).

The same regex needs to be used on both ends (display and final editing before saving in the database).

I really am totally out of starting places on this, as I have been through many iterations of regexes and logic. I'm not even sure if I should be using a regex, but rather a substring in a while loop... any hints, advice, explicit examples would be much appreciated.

Thanks,
Justin


In reply to regex for search and replace of words in HTML by jqcoffey

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.