Dear Monks,

My brother is programming a Turkish website about jazz and he does that in PHP. A few weeks ago we implemented a search mechanism for it. Then we decided to color the results, for example you search for Herbie Hancock, it gives you the results and when you view the related web page the keywords Herbie and Hancock are in a different color to attract attention and easy browsing.

To achieve a simple result and help my brother I've decided to use Perl Compatible RegEx in PHP and it worked ok. All it did was to search for keywords one by one, case-insensitive and then replace the matching pattern with font and color tags around it. That was ok until I realized that it also matched the very same keywords in URLs and IMG, ALT, etc. tags. For example if you search for Herbie Hancock and then visit this page you'll see that there is a problem. Herbie Hancock is in different color but so is the image URL! (This is the reason of your not seeing a nice image of Hancock, try it by deleting the keywords parameter from URL ;-) E.g.
<img src=" http://www.cazci.com/images/articles/ <font color=yellow style="background-color: green>herbie</font>.jpg" a +lign=right alt="<font color=yellow style="background-color: green">He +rbie</font> <font color=yellow style="background-color: green">Hancoc +k</font>">
The same thing applies to strings like
"... < a href=http://www.herbiehancock.com ... etc..."
which is converted into something like
"< a href=http://<font ... >herbie </font><font ...>hancock</font>>... +"
To solve the problem I've done the coloring first and then tried to search for the problematic parts and remove the surrounding FONT tags. It worked for the URLs and now I have to do it for IMG, ALT, etc. This makes me feel a little bit uneasy as you can imagine.

Is there any way to tell the PCRE engine(Perl Compatible RegEx library in PHP) this: Just do the coloring on the keywords that are surrounded by HTML tags like a href, IMG, ALT, etc. I'm not very experienced with look forward and backward operations in using RegEx and I'm also not sure if they are supported by PCRE.

Any ideas that can lead to a better solution will be apprreciated. I'm sorry for this PHP question but since it is Perl compatible RegEx stuff and since the grandmasters of the RegEx live here, I thought I could get help :)

In reply to Perl RegEx in PHP by YAFZ

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.