Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Perl RegEx in PHP

by YAFZ (Pilgrim)
on Aug 11, 2003 at 14:00 UTC ( [id://282877]=perlquestion: print w/replies, xml ) Need Help??

YAFZ has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

My brother is programming a Turkish website about jazz and he does that in PHP. A few weeks ago we implemented a search mechanism for it. Then we decided to color the results, for example you search for Herbie Hancock, it gives you the results and when you view the related web page the keywords Herbie and Hancock are in a different color to attract attention and easy browsing.

To achieve a simple result and help my brother I've decided to use Perl Compatible RegEx in PHP and it worked ok. All it did was to search for keywords one by one, case-insensitive and then replace the matching pattern with font and color tags around it. That was ok until I realized that it also matched the very same keywords in URLs and IMG, ALT, etc. tags. For example if you search for Herbie Hancock and then visit this page you'll see that there is a problem. Herbie Hancock is in different color but so is the image URL! (This is the reason of your not seeing a nice image of Hancock, try it by deleting the keywords parameter from URL ;-) E.g.
<img src=" http://www.cazci.com/images/articles/ <font color=yellow style="background-color: green>herbie</font>.jpg" a +lign=right alt="<font color=yellow style="background-color: green">He +rbie</font> <font color=yellow style="background-color: green">Hancoc +k</font>">
The same thing applies to strings like
"... < a href=http://www.herbiehancock.com ... etc..."
which is converted into something like
"< a href=http://<font ... >herbie </font><font ...>hancock</font>>... +"
To solve the problem I've done the coloring first and then tried to search for the problematic parts and remove the surrounding FONT tags. It worked for the URLs and now I have to do it for IMG, ALT, etc. This makes me feel a little bit uneasy as you can imagine.

Is there any way to tell the PCRE engine(Perl Compatible RegEx library in PHP) this: Just do the coloring on the keywords that are surrounded by HTML tags like a href, IMG, ALT, etc. I'm not very experienced with look forward and backward operations in using RegEx and I'm also not sure if they are supported by PCRE.

Any ideas that can lead to a better solution will be apprreciated. I'm sorry for this PHP question but since it is Perl compatible RegEx stuff and since the grandmasters of the RegEx live here, I thought I could get help :)

Replies are listed 'Best First'.
Re: Perl RegEx in PHP
by cfreak (Chaplain) on Aug 11, 2003 at 14:27 UTC

    Had it been written in Perl you would have the same problem. The first way around it would be to make a complex regex that checks to see if your substitution is taking place behind an open '<', however that is not going to be 100% effective ... really you should be using some kind of HTML parser and then use your regex when you know you have text. HTML::TokeParser is what I use in Perl... I unfortunatly had to roll my own for something similar in PHP so YMMV.

    Lobster Aliens Are attacking the world!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://282877]
Approved by sgifford
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-20 02:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found