To get a bit familiar with PPI i quickly hacked together this script. You feed it with a source-file and it spits out colored html. The colors and formatting of the individual sourcecode elements can be configured via the settings after __DATA__. The config sections relate to the classes of PPI. Check out its' docs to find out what you need to tweak.

Do you think this is worth it to be canned in a module?

Update:
#!perl # # perl2html # 2007 Markus Holzer # # script for convertin Perl-Sourcecode to Html with # syntax-highlighting # # call with sourcefile as first parameter # if the second parameter is true, output contains line numbers # # never leave home without it use warnings; use strict; use Data::Dumper; # we need this to escape special html chars use HTML::Entities; # performs the config parsing from the DATA-section use Config::IniFiles; my $cfg = Config::IniFiles->new( -file => *DATA ); unless ( $cfg ) { die join ("\n", @Config::IniFiles::errors) . "\n"; } # performs the parsing of the sourcecode use PPI; # open the sourcefile my $document = PPI::Document->new( $ARGV[0] ); #print Dumper ($document); # see if we need to output line_numbers my $line_number; my $line_numbers; my $line_number_format; if ( $ARGV[1] ) { open IN, $ARGV[0] or die "cannot count line numbers\n"; while (<IN>) { $line_numbers++ }; close IN; $line_number_format = "%0".length($line_numbers)."d"; } unless ( $document ) { die "Dokument could not be parsed\n"; } # counter for skipping elements after a heredoc my $skip_next; # Routinen für die Behandlung der Spezialfälle my %dispatch = ( # ugly but necessary # when a heredoc is parsed the elemtens after <<"FOO" # and until the semicolon are siblings of the herdoc-element # for the output to be correct we have to preread those elements 'Token::HereDoc' => sub { my ( $section, $node ) = @_; #<<"HERE" my $content = $node->{content}; # inner text of the heredoc my $text = join "", @{$node->{_heredoc}}; #HERE my $terminator = $node->{_terminator_line}; chomp $terminator; my $skip = 0; print encode($content); # preread until we find the semicolon after <<"HERE" while ( 1 ) { $skip++; $node = $node->next_sibling; process_node ($node,1); last if $node->{content} eq ";" } print "<br />", encode("$text$terminator"); $skip_next = $skip; } ); print join ("\n", @{$cfg->{v}->{Html}->{header}}), "\n"; $line_number++, print line_number($line_number) if $line_numbers; process_node($document); print "\n", join ("\n", @{$cfg->{v}->{Html}->{footer}}); sub process_node { my $node = shift; my $self = shift; my @list; if ( $self ) { @list = ($node); } else { @list = @{$node->{children}} if $node->{children}; } for my $child ( @list ) { if ( $skip_next ) { $skip_next--; next; } # what have we got here? my $class = ref( $child ); $class =~ s/^PPI:://; # opening element? if ( $child->{start} ) { out( "brackets", $child, "start"); } # do we have a special case for this element? if ( $dispatch{$class} ) { # yes. use the dispatcher $dispatch{$class}->( $class, $child ); } else { # no, so standard ouptput out ( $class, $child ); } # process the children process_node( $child ); # closing element? if ( $child->{finish} ) { out( "brackets", $child, "finish"); } } } # Outputs an element as Html aus und formats it # using the styles from the configuration sub out { my ( $section, $node, $index ) = @_; my $content; # Index is true for opening and closing elements # so we can use the correct bracket # otherwise we use the elements text value if ( $index ) { $content = $node->{$index}->{content} || ""; } else { # Text des Elements ausgeben $content = $node->{content} || ""; } $content = encode($content); # is there a style for this element? if ( $cfg->{v}->{$section} ) { my $class = $cfg->{v}->{$section}->{ $content } || $cfg->{v}->{$section}->{ '*default' } || ""; print "<span style=\"$class\">$content</span>"; } else { print $content; } } # encodes html entities and converts whitespace sub encode { my $content = encode_entities(shift); $content =~ s^ ^&nbsp;^msg; if ( $line_numbers ) { while ( $content =~ /\n/ ) { $line_number++; my $ln = line_number($line_number); $content =~ s^\n^<br />$ln^ms; } } else { $content =~ s^\n^<br />^msg; } $content =~ s^\t^&nbsp;&nbsp;&nbsp;&nbsp;^msg; return $content; } sub line_number { my $ln = sprintf($line_number_format, shift); return '<span style="' . $cfg->{v}->{LineNumber}->{'*default'} . " +\">$ln: &nbsp;</span>"; } __DATA__ [brackets] *default = color:teal; [Token::Symbol] *default = color:maroon; [Token::ArrayIndex] *default = color:maroon; [Token::Cast] *default = color:maroon; [Token::Magic] *default = color:maroon; [Token::Quote::Double] *default = color:purple; [Token::Quote::Interpolate] *default = color:purple; [Token::Quote::Literal] *default = color:purple; [Token::Quote::Single] *default = color:purple; [Token::QuoteLike] *default = color:purple; [Token::QuoteLike::Backtick] *default = color:purple; [Token::QuoteLike::Command] *default = color:purple; [Token::QuoteLike::Readline] *default = color:purple; [Token::QuoteLike::Regexp] *default = color:purple; [Token::QuoteLike::Words] *default = color:purple; [Token::DashedWord] *default = color:purple; [Token::HereDoc] *default = color:purple; [Token::Data] *default = color:purple; [Token::Separator] *default = color:purple; [Token::End] *default = color:purple; [Token::Number] *default = color:purple; [Token::Structure] *default = color:black; [Token::Operator] *default = color:lime; and = color:lime; cmp = color:lime; eq = color:lime; le = color:lime; ge = color:lime; ne = color:lime; or = color:lime; q = color:lime; qq = color:lime; qw = color:lime; qx = color:lime; tr = color:lime; xor = color:lime; [Token::Word] *default = color:black; AUTOLOAD = color:yellow; BEGIN = color:yellow; CORE = color:yellow; DESTROY = color:yellow; END = color:yellow; STDERR = color:gray; STDIN = color:gray; STDOUT = color:gray; local = color:blue; my = color:blue; package = color:blue; return = color:blue; sub = color:blue; use = color:blue; require = color:blue; do = color:blue; else = color:blue; elsif = color:blue; foreach = color:blue; for = color:blue; if = color:blue; unless = color:blue; until = color:blue; continue = color:blue; while = color:blue; goto = color:blue; import = color:blue; last = color:blue; next = color:blue; no = color:blue; our = color:blue; redo = color:blue; bless = color:red; close = color:red; closedir = color:red; die = color:red; eval = color:red; exit = color:red; grep = color:red; map = color:red; open = color:red; opendir = color:red; print = color:red; splice = color:red; split = color:red; sysopen = color:red; warn = color:red; each = color:red; values = color:red; accept = color:red; alarm = color:red; atan2 = color:red; bind = color:red; binmode = color:red; caller = color:red; chdir = color:red; chmod = color:red; chomp = color:red; chop = color:red; chown = color:red; chr = color:red; chroot = color:red; abs = color:red; connect = color:red; cos = color:red; crypt = color:red; dbmclose = color:red; dbmopen = color:red; defined = color:red; delete = color:red; dump = color:red; endgrent = color:red; endhostent = color:red; endnetent = color:red; endpwent = color:red; endservent = color:red; eof = color:red; exec = color:red; exists = color:red; exp = color:red; fcntl = color:red; fileno = color:red; flock = color:red; fork = color:red; format = color:red; formline = color:red; getc = color:red; getgrent = color:red; getgrgid = color:red; getgrnam = color:red; gethostbyaddr = color:red; gethostbyname = color:red; gethostent = color:red; getlogin = color:red; getnetbyaddr = color:red; getnetbyname = color:red; getnetent = color:red; getpeername = color:red; getpgrp = color:red; getppid = color:red; getpriority = color:red; getprotobyname = color:red; getprotobynumber = color:red; getprotoent = color:red; getpwent = color:red; getpwnam = color:red; getpwuid = color:red; getservbyname = color:red; getservbyport = color:red; getservent = color:red; getsockname = color:red; getsockopt = color:red; glob = color:red; gmtime = color:red; hex = color:red; index = color:red; int = color:red; ioctl = color:red; join = color:red; keys = color:red; kill = color:red; lc = color:red; lcfirst = color:red; length = color:red; link = color:red; listen = color:red; localtime = color:red; log = color:red; lstat = color:red; mkdir = color:red; msgctl = color:red; msgget = color:red; msgrcv = color:red; msgsnd = color:red; new = color:red; oct = color:red; ord = color:red; pack = color:red; pipe = color:red; pop = color:red; pos = color:red; quotemeta = color:red; read = color:red; readdir = color:red; readline = color:red; readlink = color:red; readpipe = color:red; recv = color:red; ref = color:red; rename = color:red; reset = color:red; reverse = color:red; rewinddir = color:red; rindex = color:red; rmdir = color:red; scalar = color:red; seek = color:red; seekdir = color:red; select = color:red; semctl = color:red; semget = color:red; semop = color:red; send = color:red; setgrent = color:red; sethostent = color:red; setnetent = color:red; setpgrp = color:red; setpriority = color:red; setprotoent = color:red; setpwent = color:red; setservent = color:red; setsockopt = color:red; shmctl = color:red; shmget = color:red; shmread = color:red; shmwrite = color:red; shutdown = color:red; sin = color:red; sleep = color:red; socket = color:red; socketpair = color:red; sort = color:red; sprintf = color:red; printf = color:red; sqrt = color:red; srand = color:red; rand = color:red; stat = color:red; shift = color:red; study = color:red; substr = color:red; symlink = color:red; syscall = color:red; sysread = color:red; system = color:red; syswrite = color:red; tell = color:red; telldir = color:red; time = color:red; times = color:red; truncate = color:red; uc = color:red; ucfirst = color:red; umask = color:red; undef = color:red; unlink = color:red; unpack = color:red; unshift = color:red; untie = color:red; utime = color:red; vec = color:red; wait = color:red; waitpid = color:red; wantarray = color:red; write = color:red; [Token::Attribute] *default = color:black; [Token::Comment] *default = color:green; [Token::Label] *default = color:black; [Token::Pod] *default = color:silver; [Token::Prototype] *default = color:navy; [Token::Regexp::Match] *default = color:olive; [Token::Regexp::Substitute] *default = color:olive; [Token::Regexp::Transliterate] *default = color:olive; [LineNumber] *default = color:olive; [Html] header=<<HEADER <html> <head></head> <body> <div style="font-family:Courier;"> HEADER footer=<<FOOTER </div></body></html> FOOTER


holli, /regexed monk/

Replies are listed 'Best First'.
Re: perlcode to html with syntax highlighting
by neniro (Priest) on May 07, 2007 at 05:15 UTC
    Hello Holli, It's really cool! But PPI::HTML already exists. Here is a small example using it, that works a lot like your example:
    #!/usr/bin/perl use strict; use warnings; use HTML::Template; use HTML::Entities; use PPI; use PPI::HTML; use Data::Dumper; use MIME::Base64; die "script needs at least one file to process!\n" unless @ARGV; my @content; my $template = HTML::Template->new(filehandle => *DATA); foreach my $source (@ARGV) { my ($type) = $source =~ /\.(.*)$/; my $part; open(FILE, '<', $source) or die "dieing while trying to open $source cause of: $!\n"; my $native = join '', (<FILE>); if ($type eq 'pl' or $type eq 'pm') { my $perldoc = PPI::Document->new( \$native ); my $highlight = PPI::HTML->new(); $part = '<pre>' . $highlight->html( $perldoc ) . '</ +pre>'; } elsif ($type eq 'png') { $part = '<div class="embimg"><img src="data:image/png;base64, +'; $part .= encode_base64($native) . '" alt=""/></div>'; } else { $part = '<pre>' . encode_entities($native) . '</pre>'; } close(FILE); push @content, { source => $source, content => $part, type => $type, }; } $template->param( content => \@content, ); print $template->output(); __DATA__ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="generator" content="perl script: project_to_html.pl" /> <title>Documentation</title> <style type="text/css" media="screen"> /* <![CDATA[ */ h1 { color: #ff9; } a { color: #99f; } pre { border-width: 1px; border-style: solid; padding: 4px; background-color: #001; color: #ddd; line-height: 0.65em; } strong { color: #f99; } .default { color: #fff; background-color: #444; font-family: Arial, Helvectia; margin: 8px; } .embimg { border-width: 1px; border-style: solid; border-color: #000; background-color: #fff; margin-top: 4px; } .comment { color: #6600dd; } .keyword { color: #ff9900; } .pragma { text-style: italic; } .substitute { color: #eeee00; } .operator { color: #6666ff; } .single { color: #0c0; } .double { color: #3c3; } .symbol { color: #d66; } .structure {color: #ccc;} .word {color: #fff;} /* ]]> */ </style> </head> <body class="default"> <TMPL_LOOP NAME="content"> <h1><TMPL_VAR NAME="source"></h1> Type of file is <strong><TMPL_VAR NAME="type"></strong><br /> <TMPL_VAR NAME="content"><br /> <hr /> </TMPL_LOOP> </body> </html>
      I was aware of PPI::Html. As said the script is little more than a finger exercise. What I like about this solution is that you can tweak each and everything. You want double quoted strings that say foo in pink? Just say
      [Token::Quote::Double] *default = color:blue; "foo" = color:pink;


      holli, /regexed monk/
Re: perlcode to html with syntax highlighting
by jwkrahn (Abbot) on May 07, 2007 at 03:52 UTC
      Thanks. It's corrected.


      holli, /regexed monk/