in reply to Need a regex to replace incomplete html entities

I basically want to replace the string &,&#,&#3,&#38 to blank, but it should not replace &
This seems to be contradictory: you want to replace &#38, but don't want to do it.

Please explain.

Update: added code tags because some characters were dropped from &#38 rendering the post difficult to understand.

Replies are listed 'Best First'.
Re^2: Need a regex to replace incomplete html entities
by Chris Daniel (Novice) on Nov 20, 2016 at 11:16 UTC
    I am looking like if & is not followed by #38; then replace the & to blank.
    If a line consist of & or &# or &#3 or &#38 should be replaced to blan +k but & should not be affected.
    Note: File is 200+ MB so thinking to apply sed command.
      If I understand you correctly, the important difference is the semi-colon: you want to replace &#38, but not if it is followed by a semi-colon (i.e. you don't want to replace &). The poor formatting in your post made it difficult to understand that.

      The easy solution is to use a negative look-ahead, as already suggested in other posts, but I doubt that sed supports look-ahead assertions (it may depend which version).

      Besides, even for a 200 MB file, this should not be a problem in Perl. Last time I compared the performance of Perl and sed, I did not find a really significant performance difference between them, but, again, this may depend on the implementation of the sed version you're using.

        You got correct Laurent. Thanks for the update and look ahead assertion.

        The reason I focus on sed command is, I want to parse the xml file which has similar multiple <Remarks> tag.

        But since file consist of incomplete html entities, parser is not able to parse the file.
        Hence I was planning to use sed command to replace the code and then parse it.