in reply to Using a config file in my regexp script.
And here is code to read it and open the file:PATH = "E:/path/to/html/file" FILE = "test1InLineCSS.html"
You could also devise a scheme to load your regexes with the config file ... but hold the press right there. I for one am really getting tired shouting "Please use an HTML parser for this!"use Config::General; my $conf = Config::General->new("foo.conf"); my %config = $conf->getall; my $filename = join('/',$config{PATH}, $config{FILE}); open INFILE, $filename or die "Can't open $filename: $!";
Please use an HTML parser for this!
You state that "I know that the HTML parser modules are better..." No. You don't know that the HTML parser modules are better, you haven't used one yet! We keep telling you that they are better but you keep on trucking with your array of HTML lines. (And what happens when tags are split across lines? Your array solution falls apart!)You also stated that you are "...exploring this approach for my MSc." What?!? There is nothing "Masters" about "parsing" HTML contained in an array with regular expressions. (UPDATE: i should have said "directly parsing with regexes" - subtle difference) No, that is very UNDERgraduate, my friend. Still don't believe me? Read on.
Your current method does this:Now. Because i am really crazy, here is your a rewrite of your code. Maybe this will finally convice you to get on the right track. Maybe. ;)
Note that i do not write back to the original file.Simply amazing, no? :) Yes. But even more amazing would be to simply OVERRIDE THE CSS! maybe something like:use strict; use warnings; use HTML::TokeParser::Simple; my $parser = HTML::TokeParser::Simple->new('tricky.html'); # these are the tags we just want to skip my %skip = ( u => 1, b => 1, i => 1, em => 1, big => 1, img => 1, strong => 1, ); # these are the styles we are going to add to h, p, and li tags my %modify = ( h => ';text-indent: 10px; word-spacing: 30px; letter-spacing: 3px; + color: black', p => ';text-indent: 10px; word-spacing: 10px; letter-spacing: 2px; + color: black', li => ';text-indent: 10px; word-spacing: 10px; letter-spacing: 2px; + color: black', ); while (my $token = $parser->get_token) { # replace body bgcolor if ($token->is_start_tag('body')) { $token->set_attr(style => 'background-color: white'); } # find and skip our "skip" tags next if $token->is_tag and $skip{$token->return_tag}; # find and modify attributes for our "modify" tags if ($token->is_start_tag) { my $candidate = $token->return_tag; $candidate =~ s/h[1-6]/h/i; #hack to handle all h tags # here we get the original style attr and add the new CSS if (my $add_attr = $modify{$candidate}) { my $orig_attr = $token->return_attr; $orig_attr->{style} .= $add_attr; $token->set_attr(%$orig_attr); } } # just print to STDOUT ... change to fit your needs print $token->as_is; }
No Perl needed at all. Not sure if this will work, but had your web page used proper CSS in the first place (that is, CSS defined in a seperate file, not inlined into the HTML), this would have made your task next to trivial.body { text-color: black; } u,b,i,em,img,big,strong {text-decoration: none;} h1,h2,h3,h4,h5,h6 {text-indent: 10px; word-spacing: 30px; letter-spaci +ng: 3px; color: black;} p,li {text-indent: 10px; word-spacing: 10px; letter-spacing: 2px; colo +r: black; }
jeffa
L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Extract inline styles to an external style sheet ( was Re: Re: Using a config file in my regexp script.)
by clscott (Friar) on Sep 17, 2003 at 19:13 UTC | |
by Anonymous Monk on Jul 08, 2015 at 14:36 UTC |