There are a lot of reasons why people have responded to your question with the answer that you should use a module to parse HTML. Among the most important reason is that HTML cannot simply be handled by a regular expression. In the simple case of removing just comments from HTML there are more things to consider than a simple regexp can accomplish.

You will find the possibility of nested comments, of server-side includes (which look like comments), of comments with multiple --comment-- blocks, and who knows what else, that could foul up your regexp plan.

There is also good logical reason to use modules. You might spend five hours figuring out your regular expression, and it still won't work 100% of the time. A well written module has many thousands of collective man-hours of work, from not just the primary author but also the vast user base of that module within the Perl community.

One person alone may or may not get something right. The collective voice of a large base of programmers, who put modules through hoops that the primary author may not have even considered in the first place, contribute their suggestions and comments, and bug alerts. They expose flaws, they find nits to pick, and in the end, the module emerges robust, reliable, and secure. This is an evolving process; nobody can say that in a fast-moving infrastructure such as the Internet or computers in general that a module is ever finished. But it's lightyears down the development path past the regexp that you might cook up in an evening with a slice or two of pizza.

The famous quote is applicable in this situation: "You can fool all of the people some of the time, and you can fool some of the people all of the time, but you cannot fool all of the people all of the time."

A module has to stand up to the rigors of all of the people, all of the time. It has proven itself not just with some people sometimes, but all people (who use it) always. To get the kind of robustness that you can find in well-written and trusted modules, you would have to quit your day job for the next ten years.

One reason Perl exists is because we're all basically lazy. Perl's proponents are terribly lazy, and Perl helps them to support the lazy lifestyle. (Ok, many are also hard workers, but lazily so). Modules support lazyness too, which is a good thing. But refusing to learn to use modules, out of lazyness, is false lazyness, or misguided lazyness, for that extra 10 minutes it takes to figure out how to use a module will save you countless hours down the road.

Another reason for using the module to solve a problem is that it is already the answer to your question. The module is a form of FAQ. It is the answer to a frequent need, rather than a frequent question. When someone comes and says, how do I accomplish this, people say, oh, use the module. Many people have put time and effort into the module. Some of them are in this forum. If you say, "But I don't want to use the module" (despite the fact that it is designed to be the answer to your problem), you are saying, "Thanks for changing my tire for me, but could you do it again, this time lifting the car by hand instead of using the jack?"

Don't resist what works best for the vast majority. Your case is not so different from that of everyone else.

Dave

"If I had my life to do over again, I'd be a plumber." -- Albert Einstein


In reply to Re: Removing html comments with regex by davido
in thread Removing html comments with regex by n4mation

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.