Hi all, I am writing a script in Perl for stripping the HTML code along with Javascript. It should remove the comments in each code. The file will be like,
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <!-- testing test--> "<!-- test -->" <body> <script type="text/javascript"> document.write('<h2>This is a header</h2>');"/* testing */" document.write('<p>/*hello*/This is a paragraph</p>'); /* sdkfjhsdf +hsdfhsdjkfhsjd fhsjdh fdjs sdfdh sfjh sdfhsd jhsdf hsdf*/ /* testing* +/ // hello this is a comment line /* CHEC This too */ "/*test /*test*/test*//*hello*/" alert("//hello"); '// This is for testing' alert("hello"); // This is for testing' "/* gdjkfghdf gdflkg jdfklgjdfkjgdfkl */" '"/* gdjkfghdf gdflkg jdfk6lgjdfkjgdfkl */' /* hello this is multiline multiline comment */ </script> <!-- fjghfdj ghjfdghjhg fgdfgdfgklfj klfg klfd flkgjhfd jkghf fgfdlkgjdfg --> <div align="center"> This is for testing.<br> Welcome to INDIA<br> <p> "<!-- hai comment -->" HI TESTING </p> <strike>this for testing<br> </strike> <center><!-- adasdasdasdasdas --> "<!-- aksdja +sdjaskdjaks"djaksdj"askd aksdjak -->" centralizing the string</center +> <input type=button name='but' value='check'/> </body> </html>
Desired Output:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> "" <body> <script type="text/javascript"> document.write('<h2>This is a header</h2>');"/* testing */" document.write('<p>/*hello*/This is a paragraph</p>'); "/*test /*test*/test*//*hello*/" alert("//hello"); '// This is for testing' alert("hello"); "/* gdjkfghdf gdflkg jdfklgjdfkjgdfkl gjkdfjgdkfgjdkfgjdfjgdfg dfg +fdg */" '"/* gdjkfghdf gdflkg jdfklgjdfkjgdfkl gjkdfjgdkfgjdkfgjdfjgdfg dfg + fdg */' </script> <div align="center"> This is for testing.<br> Welcome to INDIA<br> <p> "" HI TESTING </p> <strike>this for testing<br> </strike> <center> "" centralizing the string</center> <input type=button name='but' value='check'/> </body> </html>
Can any one give me a regular expression to fulfill my requirement or any other way to do my work... Thanks in advance....

In reply to HTML stripper... by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.