in reply to Re: Re: Intercharacter spacing
in thread Intercharacter spacing

I'm just guessing (never heard of "DzSoft Perl Editor"), but if this happens to be "line 8" of your test script:
open FILE, 'C:\Perl\Perl practice\test.txt' or die $!;
then the error report would have something to do with the "open" statement and the file name string that you're giving it.

If the perl interpreter (perl.exe) is in a directory that's covered by your PATH environment variable in a DOS shell, try stepping away from the DzSoft IDE for a bit, and use the shell. Go to the directory where your test perl script is kept, and do:

perl name_of_test_script
If it gives a similar error report, try using forward slashes "/" instead of backslashes "\" in the file name that you pass to "open()". (I did say I was guessing...) Then make sure that the "test.txt" file really does exist in that exact path.

Just out of curiosity, what do you get when you run this command in a DOS shell:

perl -V
For that matter, if you went to a directory that contains some longer file names (and names with spaces in them, etc), what would you get if you try this command:
perl -e 'opendir(D,"."); print join($/,readdir(D)),$/'
Do all the complete file names show up (long, with spaces,etc)? How about when you run that one-liner from within the IDE?

One other point: don't even think about trying to do regex substitutions on HTML text data for the sake of "expanding" visible white-space. It'll give you a headache. Doing it without HTML::TokeParser would be utterly wrong. Doing it with HTML::TokeParser (and, say, adding   in strategic spots) would just be misguided and unsatisfying (you'd see some results, but you'd rarely see results that look good).

Replies are listed 'Best First'.
Re: Re: Re: Re: Intercharacter spacing
by Tricky (Sexton) on Aug 07, 2003 at 13:48 UTC
    Thanks very much for the advice! I'm no longer an 'opening files for reading' virgin. I rechecked the file path, and gotten it wrong. Doh. It's solved now, thankfully. Next problem is to extract an image html tag from the file that i'm reading and print it out (sent that question today). After that, I'll need to write to the html file. Starting to find the hacking a lot of fun... Cheers, Richard
Re: Re: Re: Re: Intercharacter spacing
by Tricky (Sexton) on Aug 11, 2003 at 13:37 UTC
    Graff, Cheers for the pointers. I've opened my HTML test file, written regexps to remove image and anchor tags, and printed them out. Need to write these mods to the original file, then refresh the HTML page - Happy Days! My supervisor mentioned the that regexps may have limitations, so i'm beginning to look into the HTML parse-tree approach (is that the same approach you recommende?). Here's the source code i've put together so far! Rich
    #!/usr/bin/perl # remove img & anchor tags.plx # Program will read in an html file, remove the img tag and print out +entire doc. # 1. No need for file variable yet: open (INFILE, "<".$htmlFile) or di +e("Can't read source file!\n"); # 2. Alternative: m/<A\s+HREF=[^>]+>(.*?)<\/A>/ - Will not remove clo +sing tag though - why? # 3. Why is interpreter flipping-out over an 'undefined variable', whe +n # original regexp, m/<A\s+HREF=[^>]+>(.*?)<\/A>/, is known to work. + What am I missing? use warnings; use diagnostics; use strict; use HTML::Parser; # Include this module for future reference - may +need to abandon # regexps in favour of parse-trees. # Declare and initialise variables. my $pattern1 = '<IMG\s+(.*)>'; my $pattern2 = '<A\s+HREF\s*=[^>]+>'; my $pattern3 = '</A>'; my @htmlLines; # Open HTML test file and read into array. open INFILE, "E:\\Documents and Settings\\Richard Lamb\\My Documents\\ +HTMLworkspace\\HTML practice\\My First Page!\\firsttest.html" or die +"Sod! Can't open this file.\n"; @htmlLines = <INFILE>; close (INFILE); # Test for presence of patterns in HTML file if($pattern1) { scrapImageTag(); # calls to remove image tags } else { print "No tags matching this pattern within the HTML document.\n"; } if($pattern2 && $pattern3) { scrapAnchorTag(); } else { print "No tags matching this pattern within the HTML document.\n"; } # Removes image tag elements in array sub scrapImageTag { foreach my $line (@htmlLines) { # replace <IMG ...> with nothing. $line =~ s/$pattern1//ig; # case insensitivity and global search +for pattern } } # Removes anchor tag elements in array sub scrapAnchorTag { foreach my $line (@htmlLines) { # replace <A HREF ...> with nothing. $line =~ s/$pattern2//ig; # case insensitivity and global search +for pattern $line =~ s/$pattern3//ig; # case insensitivity and global search +for pattern } } printHTML(); # prints the reformatted HTML doc sub printHTML { for my $i (0..@htmlLines-1) { print $htmlLines[$i]; } } print "\n\n"; sleep 2; print "Success?!\n";
      Okay -- that is very likely what you intend most of the time, in terms of getting rid of unwanted tags. But you should note that some of the conditionals are not doing what the comments and messages say they are doing:
      # Test for presence of patterns in HTML file if($pattern1) { scrapImageTag(); # calls to remove image tags } else { print "No tags matching this pattern within the HTML document.\n"; }
      Well, the condition "if($pattern1)" does NOT test for the presence of image tags in the html data. It merely tests that some (non-empty, non-zero) value has been assigned to the scalar $pattern1, and since you have done so a few lines above this, the test will always be true -- it would be true if no data were read in from the html file.

      To test for the presence of image tags in the html data, the condition would have to be:

      if ( grep /$pattern1/i, @htmlLines )
      but there's really no reason to do the test -- just go ahead and call the "scrap" functions. If those regex substitutions apply, fine. If not, no harm done (and not that much cpu work either).
        See your point regarding the test conditions. Have a slim-line code as a result. The DzSoft Perl Editor has an 'In Browser' facility where I can view the fruits of my code. It displays the HTML exactly as it would if I'd been able to write the altered code back to the sourse file on my hard drive, i.e. images and anchors have been removed. Which is where I'm having (more) problems. This hacking business is certainly hard work, though fun (when I can get code to run).! I'm trying to write the changed code back to the file on the hard-drive, by writing on a filehandle, so I can re-open the html document. I use a print operator? I have to come clean and say that the file writing's confusing the Hell out of me. Are file tests the answer, assign to a new list variable? Time to try. Here's the code I've written so far - the file won't open for writing (yet). Confusion!!! Rich
        #!/usr/bin/perl # write mods to HTML file.plx # Program will read in an html file, remove the img tag and rewrite HT +ML on E-drive. # 1. No need for file variable yet: open (INFILE, "<".$htmlFile) or di +e("Can't read source file!\n"); # 2. Alternative: m/<A\s+HREF=[^>]+>(.*?)<\/A>/ - Will not remove clo +sing tag though - why? # 3. Why is interpreter flipping-out over an 'undefined variable', whe +n # original regexp, m/<A\s+HREF=[^>]+>(.*?)<\/A>/, is known to work. + What am I missing? use warnings; use diagnostics; use strict; # Declare and initialise variables. my $pattern1 = '<IMG\s+(.*)>'; my $pattern2 = '<A\s+HREF\s*=[^>]+>'; my $pattern3 = '</A>'; my @htmlLines; my @htmlFile; # Open HTML test file and read into array. open INFILE, "E:/Documents and Settings/Richard Lamb/My Documents/HTML +/test1InDocCSS.html" or die "Sod! Can't open this file.\n"; @htmlLines = <INFILE>; close (INFILE); scrapImageTag(); scrapAnchorTag(); # Removes image tag elements in array sub scrapImageTag { foreach my $line (@htmlLines) { # replace <IMG ...> with nothing. $line =~ s/$pattern1//ig; # case insensitivity and global search +for pattern } } # Removes anchor tag elements in array sub scrapAnchorTag { foreach my $line (@htmlLines) { # replace <A HREF ...> with nothing. $line =~ s/$pattern2//ig; # case insensitivity and global search +for pattern $line =~ s/$pattern3//ig; # case insensitivity and global search +for pattern } } # Am I deleting the contents of the list with this? Not sure... open (OUTFILE, ">@htmlLines") or die("Can't rewrite the HTML file.\n") +; print OUTFILE "@htmlLines\n"; close (OUTFILE);