comment on

eq does not work with regular expressions, but only for a direct string match :

  my $a = "foo";
  my $b = "bar";
  my $c = ".*";

  print "eq" if ($a eq $a);    # prints "eq"
  print "ne" if ($a ne $b);    # prints "ne"
  print "ne" if ($a ne $a);    # prints nothing
  print "RE" if $a =~ /$c/;    # prints "RE"
  print "RE" if $a =~ /f.*/;    # prints "RE"
[download]

What you maybe wanted was something along these lines (tested :) ):

#!/usr/bin/perl -w

  use strict;

  my $filename = $ARGV[0] || "temp.html";

  my $open;

  undef $/;             # undefine all line separators
  open( FILE, $filename ) or die "Couldn´t open $filename : $!\n";
  $open = <FILE>;    # This slurps the whole file into one scalar (ins
+tead of an array)   close FILE;

  # I'll take a simplicistic approach that assumes that
  # the only place where a ">" occurs is at the start of
  # a tag. This does fail when you have for example :
  # <IMG src="less.png" alt="a > b">
  # which is valid HTML from what I know.
  # I also ignore scripts and comment handling.

  while ($open) {
    # Match text followed by a tag into $1 and (if a tag follows exist
+s) $2:
    $open =~ s/^([^<]+)?(<[^>]+>)?//;
    print "Text : $1\n" if $1;
    print "HTML: $2\n" if $2;
  };

  # the real meat of the code is the "s///;" line
  # it works as follows :
  # The two parenthesed parts capture stuff,
  # the first parentheses capture non-tagged text
  # the second parentheses capture text that is
  # within "<" and ">"
  # one or both of the parentheses are allowed to be empty
  # Everything that is found is deleted from the start of
  # the string.
  # repeat as long as there is stuff in the slurped line
[download]

Of course, everything above could maybe be done more correct by using one of the HTML modules, like HTML::Parser - maybe you want to take a look at these modules. takshaka has mentioned a previous discussion of this topic where a working example of usage of HTML::Parser was posted by him - a direct link is here.

For more information about regular expressions read the perlre manpage.

In reply to Re: Search and replace everything except html tags by Corion
in thread Search and replace everything except html tags by thatguy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.