jcvivar has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks! I'm reading a file in my Perl code and it has some HTML comments. These comments spread for many many lines. How can I (remove|delete|don't read|transliterate to 1 simple space) all the characters and spaces between /patt1/ and /patt2/? Thank you for any help. /patt1/ =
  • Comment on Remove multiple lines between two patterns

Replies are listed 'Best First'.
Re: Remove multiple lines between two patterns
by Bloodnok (Vicar) on Oct 19, 2009 at 21:40 UTC
    The 'usual' way of achieving this is to use the flip flop operator (or range operator).
    use warnings; use strict; use autodie; while (<DATA>) { next if (/<!--/ .. /-->/); print; } __DATA__ <html> <head> <!-- some html comment --> </head> <body> <!-- some multi-line html comment --> </body> </html>
    Giving
    $ perl tst.pl <html> <head> </head> <body> </body> </html>
    A user level that continues to overstate my experience :-))
Re: Remove multiple lines between two patterns
by GrandFather (Saint) on Oct 19, 2009 at 22:48 UTC

    If it's HTML use a suitable module. You don't tell us what you want to do with the contents of the file so it's hard to advise on the most suitable module, however HTML::TreeBuilder addresses may HTML parsing and munging needs.


    True laziness is hard work
Re: Remove multiple lines between two patterns
by kennethk (Abbot) on Oct 19, 2009 at 21:21 UTC
    Are you streaming the file? Are you slurping it into an array? Are you slurping it into a string? The answer for your question depends strongly on how you are dealing with the file. Please read How do I post a question effectively? and post your code and sample input so we can understand context. For example, if you have slurped it into a string and wish to remove all text between two tags but not the tags themselves, you can use a regular expression:

    $string =~ /\/patt1\/.*?\/patt2\///sg;

    If you are streaming, you can use a flag to monitor whether you are inside a comment or not.