in reply to Re^2: Need a regex to replace incomplete html entities
in thread Need a regex to replace incomplete html entities

If I understand you correctly, the important difference is the semi-colon: you want to replace &#38, but not if it is followed by a semi-colon (i.e. you don't want to replace &). The poor formatting in your post made it difficult to understand that.

The easy solution is to use a negative look-ahead, as already suggested in other posts, but I doubt that sed supports look-ahead assertions (it may depend which version).

Besides, even for a 200 MB file, this should not be a problem in Perl. Last time I compared the performance of Perl and sed, I did not find a really significant performance difference between them, but, again, this may depend on the implementation of the sed version you're using.

Replies are listed 'Best First'.
Re^4: Need a regex to replace incomplete html entities
by Chris Daniel (Novice) on Nov 20, 2016 at 14:45 UTC
    You got correct Laurent. Thanks for the update and look ahead assertion.

    The reason I focus on sed command is, I want to parse the xml file which has similar multiple <Remarks> tag.

    But since file consist of incomplete html entities, parser is not able to parse the file.
    Hence I was planning to use sed command to replace the code and then parse it.