I think the proper way to control this sort of operation -- and keep it from screwing up other entity references (like è and <, etc) is something like this:
s/&(?!\w+;)/&/;
This is just assuming that every valid entity reference that might exist in the original text is limited to alphanumerics and underscores between the initial ampersand and the final semi-colon, which is probably a safe-enough assumption.
But keep a backup of the original. If the data still causes parse errors after this simple edit, they might be different problems you haven't fixed yet, or they might be problems created by this simple edit. Careful diagnosis would be needed in that case.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|