LiTinOveWeedle has asked for the wisdom of the Perl Monks concerning the following question:

Hi brothers, I am made some script which read html template file, in which are some hyperlinks at this syntax:

<a href="http://my.domain.com" group="1,2,5">This URL</a>

scipts validate users from cookie (in which it found usergroup value) Now script should check all hyperlink - if usergroup (which was in the cookie) is in html hyperlink as: group="value" then remove group parameter from it and print simple this:

<a href="http://my.domain.com">This URL</a>

If not script should print only This URL. There is several hyperlink in html template.

So expect that I get from cookie usergroup value and read html source from file into the $html.

$html =~ s/<a href="(.*?)" group="(.*?)">(.*?)</a>/&change($1, $2, $3) +/isge; print $html; sub change { my ( $group, $temp, $url, $hyp, @groups ); $url = @_[0]; $hyp = @_[2]; @groups = split( /,/@_[1] ); foreach $group (@groups) { if ($group == $usergroup) { $temp = "<a href=\"$url\">$hyp</a>"; return $temp; } return $hyp; } }

This is long version I know that it can be written by short way but this is test... Its working in some case but in some not. Sometimes it print only part of my template, sometimes all. It depent on html template source. When all hyperlinks have groups parameter results is different from when only some hyperlinks have group parameter... So probably problem will be in regex..... I read Death to dot star from Ovid but doesn't found answer in it....

So If you know something.... I wrote code from my memory So I hope that it's allright...

Li Tin O've Weedle
mad Tsort's philosopher

Replies are listed 'Best First'.
Re: regex and dot to star
by andye (Curate) on Apr 03, 2001 at 13:28 UTC
    Hiya - dot star will match *anything* (update: will match anything in your case because you've used /s, but normally doesn't match \n - good point tilly), you want to match *anything except a quote*. So you want ([^"]*)

    What's going wrong at the moment is this: some of your links have the 'group' parameter, some don't. Consider the case of one which doesn't, followed by one which does. So the text is

    <a href="http://www.foo.com/">Foo</a> wibble <a href="http://www.bar.c +om/" groups="1,2,3">Bar</a>
    Your first match will be
    $1= http://www.foo.com/">Foo</a> wibble <a href="http://www.bar.com/ $2= 1,2,3 $3= Bar
    i.e. the problem is that if a tag *doesn't* have the 'group' parameter, then the first .* will swallow all the text until it can find
    " group= etc...
    which will be in the *next* 'a' tag that *does* have a 'group'.

    Even without dot star, this solution isn't too robust - what if the someone misses out a quote mark (and people will in HTML)? You might want to think of a different WTDI.

    andy.

    update: and physi is right, you either need to escape the slash e.g. <\/a> or use a different delimiter for your regexp e.g. s#original#substitute#isge

    update2: have a look at HTML::Parser and maybe template toolkit. Any monks have other ideas?

    update3: Text::TagTemplate? (never used it though, don't know if it's any good) - you could use this module to define your own tag e.g. <#GROUPLINK href="http://whatever.com" group="1,2,3" text="click here for link"> so that it called your change() subroutine when it found the special tag. Then the module would parse the HTML for you.

    The reason why it's a good idea to get a module to parse the HTML for you is that it's surprisingly difficult to do correctly. E.g. what if the tag is inside a comment? what if the tag reads <a group="1,2,3" href="whatever">? If you're going to be the only one writing the HTML, then you're probably OK with a regexp - otherwise you probably do need to use a module. In any case, good luck with it. andy.

      THX,

      Physi was right - mistake happened when I write post.

      Its working now.... as you said.

      To rubustness of this - It's for my usage so I thing that it is better, than try something as own html mark to enter group value. But many thx for explain... I read death to dot star from Ovid before, but because my poor english I didn't comprehend all of it.

      Li Tin O've Weedle
      mad Tsort's philosopher

      While .* can match anything, it won't match \n by default. You need the /s modifier to do that.
      Thank to Andye,
      last version which seems to work OK is:

      $html =~ s/<a href="([^"]*)" group="([^"]*)">([^<]*)<\/a>/&change($1, +$2, $3)/isge;

      For more tags or open usage is parser better way...

      Li Tin O've Weedle
      mad Tsort's philosopher

Re: regex and dot to star
by physi (Friar) on Apr 03, 2001 at 13:10 UTC
    Ok, in your regex line there's a little error, but that might be, because of memory hacking.
    $html =~ s/<a href="(.*?)" group="(.*?)">(.*?)</a>/&change($1, $2, $3) +/isge; must be $html =~ s/<a href="(.*?)" group="(.*?)">(.*?)<\/a>/&change($1, $2, $3 +)/isge;
    This works fine for the given line.
    Maybe you could give us an example of a working and a not working input line?
    ----------------------------------- --the good, the bad and the physi-- -----------------------------------
      Yes THX,
      You are right, but syntax which I try is allright - this one is my mistake (when I wrote question). I read some new posts abour (.*?)..... What about use "^"+" instead? To closely describe what script do:

      It seems that regex take two or more hyperlink as one..... This is the part of source html code:

      <tr> <td width="11%" align="center"><a href="readcsvplus.pl?config=firm +s.pl&sort_a=id&template=5" group="1"><b><font size="2">výpis<br>spole +čností</font></b></a></td> <td width="11%" align="center"><a href="../../../../addfirm.shtml" + group="2"><b><font size="2">přidej<br>společnost</font></b></a></td> <td width="11%" align="center"><a href="cookie.pl?config=login.pl& +method=logout" group="1"><b><font size="2">logout</font></b></a></td> <td width="11%" align="center"><b><font size="2">výpis<br>zakázek< +/font></b></td> <td width="11%" align="center"><a href="../../../../addorder.shtml +"><b><font size="2">přidej<br>zakázku</font></b></a></td> <td width="11%" align="center"><b><font size="2">&nbsp;</font></b> +</td> <td width="11%" align="center"><b><font size="2">výpis<br>uživatel +ů</font></b></td> <td width="11%" align="center"><a href="../../../../adduser.shtml" + group="5"><b><font size="2">přidej<br>uživatele</font></b></a></td> <td width="12%" align="center"><b><font size="2">&nbsp;</font></b> +</td> </tr>

      this is the result:

      <tr> <td width="11%" align="center"><a href="readcsvplus.pl?config=firm +s.pl&sort_a=id&template=5"><b><font size="2">výpis<br>společností</fo +nt></b></a></td> <td width="11%" align="center"><b><font size="2">přidej<br>společn +ost</font></b></td> <td width="11%" align="center"><a href="cookie.pl?config=login.pl& +method=logout"><b><font size="2">logout</font></b></a></td> <td width="11%" align="center"><b><font size="2">výpis<br>zakázek< +/font></b></td> <td width="11%" align="center"><b><font size="2">přidej<br>uživate +le</font></b></td> <td width="12%" align="center"><b><font size="2">&nbsp;</font></b> +</td> </tr>

      Li Tin O've Weedle
      mad Tsort's philosopher