comment on

This step may be totally unnecessary. On a UNIX system, you can use the cat command to append all of these files into one file. On a Windows/DOS box, I believe the copy command supports the + feature (i.e. copy file1+file2+file3 file4)

Another method of this script would be like this:

$flip=1;
while(<>){
  $flip=0 if /<\/body/i;
  print $_ if $flip;
  $flip=1 if /<body/i;
}
print "</body>\n</html>\n";
[download]

This is a quick and dirty script which assumes that the body tags are alone on the line (or at least that there is nothing there that should be preserved in the final version.)

It would be called like "faqcat.pl file1.html file2.htrml filen.html > newfile.html
(Should work in both UNIX and Win/DOS)

The next step is making sure that any internal links work correctly. If the files are merely text (i.e. non-linking HTML) this will be enough, but probably it has internal links to Table of contents, and different sections. I can't say how to correct those, because a lot depends on how the files are written, but if the syntax is simple, you can do it with Regex's, if it is more complex, you can do it with HTML::Parser.

Update: Some explanation on how the script works:
The idea is that we want to print everything that is not following a /BODY and not before a BODY, but we do want to print the stuff before the first BODY.
The <> automagically brings the next line from any file(s) on the command-line. see perlop.

In reply to Re: html parse - concatenation by swiftone
in thread html parse - concatenation by Buckaroo Buddha

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.