parsing accute accent marks from html

tbone has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks
I have been using perl for about 6 months so I'm sorry for the simplicity of this question, but I'm stumpted. I am parsing an html table using HTML::TableExtract. When I display some of the rows they contain á which I want to substitute. I viewed the source of the html and the value that creates this accent mark is (&)nbsp; but without the parentheses. I have tried both the value seen in the source and the liveral value (á) as the the values in my s// but niether work. This accute accent mark is found both in the beginning and middle of some of the rows, so I cant just drop the first letter in the string. I would appreaciate any insight. Thanks

Comment on parsing accute accent marks from html

Replies are listed 'Best First'.
Re: parsing accute accent marks from html by glivings (Scribe) on Mar 19, 2003 at 16:56 UTC
You might want to have a look at HTML::Entities for a general solution of dealing with extended characters. Perhaps encoding the strings in question, and then doing a search and replace on their entity representation.	[reply]
Re: parsing accute accent marks from html by Thelonius (Priest) on Mar 19, 2003 at 18:00 UTC
An nbsp is a non-breaking space. Unless something went terribly wrong it wouldn't display as an a with an acute accent. That would be á or á	[reply]