in reply to regex and dot to star

Hiya - dot star will match *anything* (update: will match anything in your case because you've used /s, but normally doesn't match \n - good point tilly), you want to match *anything except a quote*. So you want ([^"]*)

What's going wrong at the moment is this: some of your links have the 'group' parameter, some don't. Consider the case of one which doesn't, followed by one which does. So the text is

<a href="http://www.foo.com/">Foo</a> wibble <a href="http://www.bar.c +om/" groups="1,2,3">Bar</a>
Your first match will be
$1= http://www.foo.com/">Foo</a> wibble <a href="http://www.bar.com/ $2= 1,2,3 $3= Bar
i.e. the problem is that if a tag *doesn't* have the 'group' parameter, then the first .* will swallow all the text until it can find
" group= etc...
which will be in the *next* 'a' tag that *does* have a 'group'.

Even without dot star, this solution isn't too robust - what if the someone misses out a quote mark (and people will in HTML)? You might want to think of a different WTDI.

andy.

update: and physi is right, you either need to escape the slash e.g. <\/a> or use a different delimiter for your regexp e.g. s#original#substitute#isge

update2: have a look at HTML::Parser and maybe template toolkit. Any monks have other ideas?

update3: Text::TagTemplate? (never used it though, don't know if it's any good) - you could use this module to define your own tag e.g. <#GROUPLINK href="http://whatever.com" group="1,2,3" text="click here for link"> so that it called your change() subroutine when it found the special tag. Then the module would parse the HTML for you.

The reason why it's a good idea to get a module to parse the HTML for you is that it's surprisingly difficult to do correctly. E.g. what if the tag is inside a comment? what if the tag reads <a group="1,2,3" href="whatever">? If you're going to be the only one writing the HTML, then you're probably OK with a regexp - otherwise you probably do need to use a module. In any case, good luck with it. andy.

Replies are listed 'Best First'.
Re: Re: regex and dot to star
by LiTinOveWeedle (Scribe) on Apr 03, 2001 at 14:16 UTC
    THX,

    Physi was right - mistake happened when I write post.

    Its working now.... as you said.

    To rubustness of this - It's for my usage so I thing that it is better, than try something as own html mark to enter group value. But many thx for explain... I read death to dot star from Ovid before, but because my poor english I didn't comprehend all of it.

    Li Tin O've Weedle
    mad Tsort's philosopher

Re (tilly) 2: regex and dot to star
by tilly (Archbishop) on Apr 04, 2001 at 04:54 UTC
    While .* can match anything, it won't match \n by default. You need the /s modifier to do that.
Re: Re: regex and dot to star
by LiTinOveWeedle (Scribe) on Apr 03, 2001 at 22:08 UTC
    Thank to Andye,
    last version which seems to work OK is:

    $html =~ s/<a href="([^"]*)" group="([^"]*)">([^<]*)<\/a>/&change($1, +$2, $3)/isge;

    For more tags or open usage is parser better way...

    Li Tin O've Weedle
    mad Tsort's philosopher