When I'm trying to reply to someone's post, I often want to take their code and play with it. Unfortunately, if I cut and paste from my browser, I lose all formatting and the code all runs together in one big line. I find it a pain to try to break all of the lines of code apart by hand.

If I "view source", I have a bunch of funny HTML characters that have to be translated to their ASCII equivalent. I wrote this little program to deal with this. To use it, I "view source", take the appropriate code section and save it to a file. I then run this program using the path and filename of the new Perlmonks code as the argument. This code goes through and converts the HTML character codes to their ASCII equivalent and also corrects for that annoying $some_var[0] problem that happens when people post with pre tags instead of code tags (yes, I wrote $some_var[0] incorrectly on purpose). It then writes the new code to the file it read it from.

Incidentally, I am not posting this to the CODE section as it's just a quick hack and not worthy of being there. Also, I have overcommented the program so new Monks can understand some of the weird bits.

#!/usr/bin/perl -w use strict; # Don't you dare write code without this line and the -w +switch my %charcodes; my ($filename = shift @ARGV); %charcodes = ( "&#091;" => "[", "&#093;" => "]", "&#91;" => "[", "&#93;" => "]", "&quot;" => "\"", "&lt;" => "<", "&gt;" => ">", "&amp;" => "&" ); # Using '+<' to open the file in update mode open (FILEHANDLE, "+< $filename") || die "Can't open $filename in upda +te mode: $!\n"; # Reading the entire file into the array my @program_line = <FILEHANDLE>; foreach (@program_line){ # The regex below might confuse some people new to perl, # so I'll do some explaining here. # You might think that I could use &.*; to match a hash value. # This fails for two reasons: # 1. We might have a sub which is identified with ampersand # 2. If there is more than one semicolon after the ampersand, # the regex will be "greedy" and will include the # rightmost semicolon. We can use &.*?; to try to force # the regex to be lazy, but this could involve a lot of # backtracking and make the regex less efficient. # &[^;]{2,6}; is a good regex. The negated character class guaran +tees # that we will only match 2 to 6 non-semicolons after the ampersan +d # (and we go out to six characters in case this script is upgraded # to translate things like &eacute; to é.) # The right side of this substitution uses the trinary operator # ($x = ($a > b) ? $a : $c) to substitute the hash value of of # character code if such hash value exists, otherwise it substitue +s $1 # back to itself. This is not the most efficient way of doing thi +s as # we have a null substitution, but it works. # The /e modifier makes the trinary operator executable. # The /g modifier makes the regex global (i.e. we will modify ever +y # character code on a single line s/(&[^;]{2,6};)/(exists $charcodes{$1}) ? $charcodes{$1} : $1/eg; # The following code will correct for URL expansion of code like # $some_hash_var[0] which gets posted with <PRE> tags rather # than <CODE> tags. Don't use it for other URL substitutions # because it relies on Perlmonks specific syntax. s/(<a href="[^"]+">(\d+)<\/a>)/[$2]/g; } # Go back to start of file seek(FILEHANDLE, 0, 0) or die ("Seek failed on $filename: $!\n"); print FILEHANDLE @program_line or die ("Print failed on $filename: $!\ +n"); # truncate the file so we don't have excess garbage at the end truncate(FILEHANDLE, tell(FILEHANDLE)) or die ("Truncate failed on $fi +lename: $!\n"); close (FILEHANDLE) or die ("Close failed on $filename: $!\n");
I hope someone finds this helpful. Also, any suggestions for improvements would be most welcome.

Cheers,
Ovid


In reply to Using code posted on PerlMonks by Ovid

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.