monsieur_champs has asked for the wisdom of the Perl Monks concerning the following question:
Fellow monks
I'm working with HTML files to generate TIFF images, as part of a fax-to-email gateway. I currently have a funny problem, my fax-to-email gateway generates blank, empty pages (really no content at all) depending on the body of the email is HTML-Encoded or not.
I decided to work arround this problem by verifying if the HTML file presented to generate the TIFF image is capable of generating any content. At this point, I ended writting this little module (for reusability) and complementary test files (below)
# File IsHTMLEmpty.pm: package IsHTMLEmpty; use strict; use warnings; use Carp qw/croak/; use base qw/ Exporter /; use vars qw/ @EXPORT /; @EXPORT = qw/ &isHTMLEmpty /; sub isHTMLEmpty( $ ){ my $filename = shift; return undef unless -r $filename; open IN, $filename or croak $!; local $/ = undef; my $html = <IN>; close IN or croak $!; return $html =~ m{<body[^>]*>\s*</body>}mo; } 1; __END__
This module just wraps a single function isHTMLEmpty() that decides is the file presented is capable of generating viewable content or not. To use it, you could use something like this example script:
#!/usr/bin/perl # File test: use warnings; use strict; use lib '/path/to/my/lib_dir/'; use Carp qw/ croak confess /; use IsHTMLEmpty; confess "isHTMLEmpty isn't defined. I'm sorry.\n" unless defined &isHTMLEmpty; confess "Sorry, this HTML can generate content.\n" unless isHTMLEmpty './test.html'; print "Ok.\n" if isHTMLEmpty './test.html'; __END__
And finally, when presented to the file below, I get the right answer, that is: this file is expendable, you can safely discard it and generate one less TIFF image to send via fax:
<html> <head> <title>Titles aren't considered content.</title> </head><body> </body> </html>
But, when I add only a single <br> tag, the file still expendable, and should be discarded, as shall it if there is just an empty <p> inside it. The problem is that my code isn't capable of deciding this (yet) and tell me that this file is necessary because it can (?) generate viewable content.
Ok, enough talk. The question is: shall I implement a big regular expression to deal with all (the most part?) of the cases and forget it, or There Is A Perlish Way To Do It(tm)?
What I expect as answer: suggestions, snippets or pointers to modules capable of implementing this as faster as possible. I need this ready as soon as possible. And yes, you can golf down my problem if you're capable.
Thank you all for your attention, and may the gods bless you all.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
monsieur_champs
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: How to decide if an HTML is expendable?
by fglock (Vicar) on Jul 14, 2003 at 21:36 UTC | |
|
Re: How to decide if an HTML is expendable?
by diotalevi (Canon) on Jul 14, 2003 at 21:26 UTC | |
|
Re: How to decide if an HTML is expendable?
by TVSET (Chaplain) on Jul 14, 2003 at 21:28 UTC | |
by Willard B. Trophy (Hermit) on Jul 15, 2003 at 21:11 UTC |