vbrtrmn has asked for the wisdom of the Perl Monks concerning the following question:

I've been meditating on this problem for some time now, but have come to no enlightenment. So, I have come to receive wisdom from the great monestary.

I am trying to get the paths and filenames of includes from a line of HTML. For example:

<b>here lies my HTML</b><!--#include virtual="/inculde/somedir/somefil +e.html" --> more HTML here followed by another inclued <!--#include v +irtual="/include/somedir/somefile.html" --> end of HTML is here

The HTML may contain one or more includes. I understand how to cut up the file using regular expressions and split(). I am unsure about how to do this with an unknown number of includes.

I was thinking about using HTML::Parse, but our security "force" will not allow my team to add anything to their server.

TIA
--
initiate paul

Replies are listed 'Best First'.
Re: HTML INCLUDES
by chromatic (Archbishop) on Apr 30, 2001 at 20:02 UTC
    You've identified the solution I'd choose (HTML::Parse or another module), but in the absence of that, you can get a pretty good solution with split:
    my @chunks = split(/<!--#include virtual="([^"]+)" -->/, $data); while (@chunks) { # ought to splice here my ($html, $inc) = (shift @chunks, shift @chunks); # print $html to file # include $inc if possible }
    Beware that @chunks may contain an odd number of elements, so $inc may be empty on the last iteration.

    It's not the *best* way to do it, but it's one way to do it.

      I think that's going to do it for me

      thanks a lot!!

      --
      paul
Re: HTML INCLUDES
by the_slycer (Chaplain) on Apr 30, 2001 at 18:45 UTC
    Well, Perhaps something like:
    while (<FILE>){ if (/your regex to match includes/){ #do something with it } }
    This will read it line by line, you could push matches into an array or whatever you need to do to use the pathnames.

    Unless I completely misunderstand you that may do the trick.
Re: HTML INCLUDES
by DrZaius (Monk) on Apr 30, 2001 at 18:49 UTC
    Hmmm, you seem to be reinventing the wheel, especially if you are using perl for this. Check out HTML::Template -- it does this plus more for you already.

    If you want to have includes that aren't path specific, look into to using INC => [] in the constructor.

    Also, if you are looking to componentize your websize, take a peek at Apache::Pagekit -- it uses HTML::Template underneither, but creates a nice framework for you.

    cheers.

      I think you missed his reason for not using extra modules...his security "force" will not allow his team to add anything to the server (IE: Perl Modules). See the slycer's post for an answer, however. It should do the trick for you. You may want to try adding a global and/or case-insensitive flag to the regex just in case.

      -Adam Stanley
      Nethosters, Inc.