HTML INCLUDES

vbrtrmn has asked for the wisdom of the Perl Monks concerning the following question:

I've been meditating on this problem for some time now, but have come to no enlightenment. So, I have come to receive wisdom from the great monestary.

I am trying to get the paths and filenames of includes from a line of HTML. For example:

<b>here lies my HTML</b><!--#include virtual="/inculde/somedir/somefil
+e.html" --> more HTML here followed by another inclued <!--#include v
+irtual="/include/somedir/somefile.html" --> end of HTML is here
[download]

The HTML may contain one or more includes. I understand how to cut up the file using regular expressions and split(). I am unsure about how to do this with an unknown number of includes.

I was thinking about using HTML::Parse, but our security "force" will not allow my team to add anything to their server.

TIA
--
initiate paul

Comment on HTML INCLUDES Download Code

Replies are listed 'Best First'.
Re: HTML INCLUDES by chromatic (Archbishop) on Apr 30, 2001 at 20:02 UTC
You've identified the solution I'd choose (HTML::Parse or another module), but in the absence of that, you can get a pretty good solution with split: `my @chunks = split(/<!--#include virtual="([^"]+)" -->/, $data); while (@chunks) { # ought to splice here my ($html, $inc) = (shift @chunks, shift @chunks); # print $html to file # include $inc if possible }` [download] Beware that @chunks may contain an odd number of elements, so $inc may be empty on the last iteration. It's not the best way to do it, but it's one way to do it.	[reply] [d/l]
Re: Re: HTML INCLUDES by vbrtrmn (Pilgrim) on Apr 30, 2001 at 23:07 UTC
I think that's going to do it for me thanks a lot!! -- paul	[reply]
Re: HTML INCLUDES by the_slycer (Chaplain) on Apr 30, 2001 at 18:45 UTC
Well, Perhaps something like: `while (<FILE>){ if (/your regex to match includes/){ #do something with it } }` [download] This will read it line by line, you could push matches into an array or whatever you need to do to use the pathnames. Unless I completely misunderstand you that may do the trick.	[reply] [d/l]
Re: HTML INCLUDES by DrZaius (Monk) on Apr 30, 2001 at 18:49 UTC
Hmmm, you seem to be reinventing the wheel, especially if you are using perl for this. Check out HTML::Template -- it does this plus more for you already. If you want to have includes that aren't path specific, look into to using INC => [] in the constructor. Also, if you are looking to componentize your websize, take a peek at Apache::Pagekit -- it uses HTML::Template underneither, but creates a nice framework for you. cheers.	[reply]
Re: Re: HTML INCLUDES by astanley (Beadle) on Apr 30, 2001 at 19:08 UTC
I think you missed his reason for not using extra modules...his security "force" will not allow his team to add anything to the server (IE: Perl Modules). See the slycer's post for an answer, however. It should do the trick for you. You may want to try adding a global and/or case-insensitive flag to the regex just in case. -Adam Stanley Nethosters, Inc.	[reply]