Re: looking for a regexp

Look into HTML::Parser. HTML is very hard to find because you can have nested < >. merlyn's WebTechnique columns have many HTML::Parser examples, as did the latest issue of The Perl Journal

That said, a basic Regexp to match simple HTML is:

/<[^>]*>/  #matches an HTML tag
#So you would want:
# 
s/(<[^>]*>[^<\s]*)\s+/$1\&nbsp;/g 
#Should match a tag followed by some non-tag, non whitespace, followed
+ by whitespace.  Untested.
[download]

You will also have to match any whitespace before the first tag, but you can probably handle that.

Comment on Re: looking for a regexp Download Code

Replies are listed 'Best First'.
RE: Re: looking for a regexp by merlyn (Sage) on Jun 09, 2000 at 02:54 UTC
And swiftone was known to speak: That said, a basic Regexp to match simple HTML is: `/<[^>]*>/ #matches an HTML tag` [download] Uh, no. This incorrectly stops on `<hello there="inside > foo">` [download] too early. Please use `HTML::Filter` or one of the other `HTML::Parser`-derived modules. -- Randal L. Schwartz, Perl hacker	[reply] [d/l] [select]
RE:(2) looking for a regexp by swiftone (Curate) on Jun 09, 2000 at 17:39 UTC
Uh, no. This incorrectly stops on `<hello there="inside > foo">` [download] That's not _simple_ HTML anymore. :) Packages are better (as I suggested), but sometimes you need a quick script for a simple task and you don't want to have to learn a package to do it.	[reply] [d/l]