tallCoolOne has asked for the wisdom of the Perl Monks concerning the following question:

I have recently overcome a major hurdle assembling a website of purely the hobby sort, by using a series of perl programs to put together my simple html pages, and links to those pages and so forth. The end result was very pleasing to my eye, and as I had completed assembling all of my original reference documents (text files), and assembling them into nice, uniform html pages, I sought out and found a web hosting service to my liking and bought space.
Once I had "moved in", I found that they offered a nice, easy perl page counter which was once of my little missing details that I had promised myself that I would get around to. This involved writing a small perl program that would create a text file containing "0" in it to initialize the count for a particular page, where the name of the counter file corresponds to the file being counted. I am sure that you are all familiar with this general approach.
Well, it was easy enough to create the "counter files" with the names of all the pages on the site (why count some pages, when perl will let you count them all?). I then wrote a quick and dirty perl program to go through all of my previously created html pages, and add the single line needed to call the counter program, which also had to contain the name of the page itself.
Now to the part where it got sticky for me. The pages are named for what they are, or the title of the material being discussed on the page, and in many cases (43 out of 423 total pages), the title of the page contains an apostrophe - ' , as in:

Joe's_Page.shtml
Mark's_Page.shtml
Moe's_Reasons_For_Putting_A_Site_Together.shtml
Really_'Cool'_Reasons_For_This_Site_To_Exist.shtml

Well, it turns out that those apostrophes (single quotes) are a problem, and cause syntax errors in the perl program for the pages that contain the single quotes in the titles, but the pages without work fine. And I didn't have any trouble with the first little kludge to create "counter files" which had the same single quotes in the names. Here's the first program:
#!/user/bin/perl @textFile = `cat Brain1.txt`; foreach $line (@textFile) { chop $line; open(COUNTERFILE, ">./$line"); print COUNTERFILE "0"; close(COUNTERFILE, $line); }

Brain1.txt is just a list of all the files to make counter files for, one file name to a line(filenames only, no extensions). This worked great, and made 423 files each containing "0" in it, and named exactly as I wanted. I had to get a little more kludgy to insert the call for the perl program into the actual files as the below travesty reveals:
#!/user/bin/perl @textFile = `cat Brain1.txt`; foreach $line (@textFile) { chop $line; $pagefile = $line.".shtml"; @ModPageFile = `cat $pagefile`; foreach $fileline (@ModPageFile) { $fileline =~ s/$line.shtml"-->/$line.shtml"-->\n<SCRIPT langua +ge="JavaScript" SRC="http:\/\/mygreatwebsite.net\/cgi-bin\/gcountdir\ +/gcount.pl?0=$line"> <\/SCRIPT>/ } open(NEWPAGEFILE, ">./$pagefile"); print NEWPAGEFILE @ModPageFile; close(NEWPAGEFILE); } exit; <br>
Can anyone tell me how to get around this apostrophe problem? It's rather annoying, and I figure in cases like this, perl is smarter than I am, and can be told how to deal with it. I thought of various ways of renaming the files to get the single quotes out of the names, and processing them that way, and then changing them back, but it all gets ugly rather fast. I'd really appreciate a lesson on how-to handle file names as strings, and not worry about certain characters.
Because my next big idea is to create a perl script that will will reach out into deep internet space and open each of those counter files and pick out the number contained inside from my laptop down here on earth, and make up a pretty little list of which pages get read, and which don't. I am hoping to not have to deal with the apostrophe problem when I get to that point.
It's all about using perl to help me stroke my tiny ego and show me that my site gets read by people.
Thanks, Monks

Replies are listed 'Best First'.
Re: What's the deal with apostrophes?
by JavaFan (Canon) on Jun 09, 2009 at 09:29 UTC
    The problem is that the apostrophe is special to the shell, and your quick way of reading the content of a file uses the shell.

    If you don't have any double quotes in your filenames, a quick way to do what you want is:

    @ModPageFile = `cat "$pagefile"`;
    this will "escape" the single quotes. Of course, if you have double quotes, it breaks down again.

    Alternatively, avoid the shell. Another quick way of slurping in the content of a file is:

    @ModPageFile = do {local @ARGV = $pagefile; <>};
    Or you could use a module like Perl6::Slurp. Or just open the file yourself, checking for errors and reading in the content:
    open my $fh, "<", $pagefile or die "open $pagefile: $!"; @ModPageFile = <$fh>; close $fh or die "close $pagefile: $!";
Re: What's the deal with apostrophes?
by dsheroh (Monsignor) on Jun 09, 2009 at 11:11 UTC
    JavaFan's already answered your core question quite thoroughly, but I'd like to offer a little side advice on the true travesty1 in your demo code:
    $fileline =~ s/$line.shtml"-->/$line.shtml"-->\n<SCRIPT language="Java +Script" SRC="http:\/\/mygreatwebsite.net\/cgi-bin\/gcountdir\/gcount. +pl?0=$line"> <\/SCRIPT>/
    Although / is the default regex delimiter in Perl, you can use different punctuation in that role to avoid "leaning toothpick syndrome":
    $fileline =~ s#$line.shtml"-->#$line.shtml"-->\n<SCRIPT language="Java +Script" SRC="http://mygreatwebsite.net/cgi-bin/gcountdir/gcount.pl?0= +$line"> </SCRIPT>#
    or even, if you want to get fancy:
    $fileline =~ s[$line.shtml"-->] [$line.shtml"-->\n<SCRIPT language="JavaScript" SRC="htt +p://mygreatwebsite.net/cgi-bin/gcountdir/gcount.pl?0=$line"> </SCRIPT +>]

    Edit: 1Just to clarify, I referred to the regex in question as a "travesty" solely to echo the OP's description of the code it came from as "the below travesty". I would not normally consider it nearly bad enough to qualify as such.

      esper, javafan,

      Thanks to both of you - these are both some of the "rough edges" in my coding knowledge. I really appreciate the input. I will add these approaches to my scripts, and I am sure that they will work as I had hoped that they would originally.
      *bows before the monks*
      This one is grateful for the wisdom you have shared with me.

        I still see a problem with your code: You are not counting "pages requested", but "pages requested with a browser that supports and allows Javascript". The real page request counts are in the access log of your web server. If you don't have access to that, you could deliver all pages through a perl script that counts the request.

        Oh, and by the way: You are using file locks when you update the counter file, right? And you check that the file name passed by the browser is one of the files you want to update, right?

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Esper, Is there a particular command or system variable to re-set to change from the default search delimiter / to the examples you have given # ?
        It seems a little odd to me to just change characters for something like that and expect perl to "know" that's what they are there for. Does it really work that way?
        Again, you have my gratitude for your enlightening information.
        Mark