in reply to Converting HTML tags into uppercase using Perl

It would be really simple to knock up something that did this using HTML::Parser, but it's perhaps worth pointing out that if you are at all interested in XHTML compatibility then valid XHTML tags are all lower case.

Update: Here's a basic HTML::Parser solution. It can almost certainly be improved and/or simplified.

#!/usr/bin/perl use strict; use warnings; use HTML::Parser; my $p = HTML::Parser->new(start_h => [\&start, 'tagname, attr, attrseq +'], end_h => [\&end, 'tagname'], text_h => [\&text, 'text']); $p->parse_file(shift); sub start { my ($name, $attr, $attrseq) = @_; print '<' . uc($name); if (keys %$attr) { foreach (@$attrseq) { print ' ' . uc($_) . '="' . $attr->{$_} . '"'; } } print '>'; } sub end { print '</' . uc($_[0]) . '>'; } sub text { print $_[0]; }
--
<http://dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

Replies are listed 'Best First'.
Re^2: Converting HTML tags into uppercase using Perl
by steve_g50 (Initiate) on Nov 30, 2005 at 11:10 UTC
    Ive tried this, but i can't get it to register the filename after i've entered it. Any ideas?
    #!/usr/bin/perl use warnings; use HTML::Parser; print("Enter an html file (with either a .html or .htm extension): "); $file=<STDIN>; my $file = $ARGV[0]; unless ($file) { print ("No filename given\n"); exit; } my $new; my $p = HTML::Parser->new( start_h => [ \&start_h, 'tagname, text' ], end_h => [\&end_h, 'tagname, text' ], default_h => [sub { $new .= shift }, 'text'], ); $p->parse_file($file); # Rename the old file my $newfile = $file.'.old'; rename($file, $newfile) or die "Can't rename $file: $!"; # Write the new text to the old filename open my $fh, ">", $file or die "Can't create new file: $!"; print $fh $new; close $fh; sub start_h { my($tag, $text) = @_; my $uc = uc $tag; $text =~ s/$tag/$uc/; $new .= $text; } sub end_h { my($tag, $text) = @_; my $uc = uc $tag; $text =~ s/$tag/$uc/; $new .= $text; }

      $file=<STDIN>;

      my $file = $ARGV[0];

      This looks pretty confused to me. You read the filename from STDIN into a package variable called $file (incidently, you don't chomp that value so it still has a newline character on the end). You then ignore that value and create a new, lexical, variable also called $file and into that you copy the value of the first command line argument. You don't say how you call the program, but if you don't give it any command line arguments then that will be 'undef'. You then ignore the package variable (which has the correct value - albeit with an extra newline) and continue to use the lexical value which (probably) contains 'undef'.

      So, no, it almost certainly won't do what you want :)

      This is a good example of why you should always have use strict in your programs.

      You probably want to write that code something like this (untested):

      # check to see if you have a command line argument my $file = $ARGV[0]; # if not, or if it's not an HTML file, then prompt for one until ($file && ($file =~ /\.html?$/i)) { print('Enter an html file (with either a .html or .htm extension): ' +); $file=<STDIN>; chomp $file; }
      --
      <http://dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

        Cheers dave. It works. Thank everyone who helped.