Tricky has asked for the wisdom of the Perl Monks concerning the following question:

Hello holy ones, I am working on a piece of code which will extract an image tag from an HTML doc I've read-in from my hard-drive. The '$i' index in the array is throwing a 'global symbol "$i" requires explicit package name..'. Here's the code:
#!/usr/bin/perl #htmltest2.plx # Program will read in an html file, remove the img tag and print out # no need for file variable yet: open (INFILE, "<".$htmlFile) or die(" +Can't read source file!\n"); use warnings; use diagnostics; use strict; my @htmlLines; open INFILE, "E:\\Documents and Settings\\Richard Lamb\\My Documents\\ +HTMLworkspace\\HTML practice\\My First Page!\\firsttest\.html" or die + ("Sod! Can't open this file.\n"); @htmlLines = <INFILE>; scrapTag(); # calls method to remove image tags sub scrapTag # removes image tags from HTML document { while($htmlLines[$i] =~ m/<IMG\s+([^>]+)>/ig) # finds each instance +of image tag in the input file { s/<IMG\s+([^>]+)>/ig//ig # replaces each instance of image tag wit +h nothing! } } for my $i (0..@htmlLines-1) { print $htmlLines[$i]; } print "\n\n"; sleep 2; print "Success?!\n";
Where am I going wrong, folks? Cheers, Richard

Replies are listed 'Best First'.
Re: Global symbol probs...
by Abigail-II (Bishop) on Aug 07, 2003 at 13:31 UTC
    while($htmlLines[$i] =~ m/<IMG\s+([^>]+)>/ig)

    Well $i is undefined and undeclared here. Perhaps you want to have another loop? Not that processing HTML line by line is useful....

    s/<IMG\s+([^>]+)>/ig//ig

    You are dividing the result of the substitution by the result of calling the (undefined) function ig?

    Abigail

      Hello Abigail, Much thanks for the help. I'm just trying to get onto my Perl feet at the moment, so processing line by line is just an exercise for me. The regexp that I've put together is wrong, as you (and others) have highlighted, so I've change it to this:
      s/<IMG\s+([^>]+)>//ig
      No more errors, as I included a lexical variable, $i, for the while loop. Haven't managed to remove the image tags yet... Cheers, Richard
Re: Global symbol probs...
by mpeppler (Vicar) on Aug 07, 2003 at 13:50 UTC
    Without commenting on the general technique used to extract data from html:
    sub scrapTag # removes image tags from HTML document { while($htmlLines[$i] =~ m/<IMG\s+([^>]+)>/ig)
    I think you want something like this here:
    sub scrapTag { foreach my $line (@htmlLines) { # replace <IMG ...> with nothing. $line =~ s/<IMG\s+([^>]+)>//ig; } }
    which will walk the list of lines and execute the substitution for each line. Note - I made no effort to code a correct regexp to achieve the desired results

    Michael

      Hello Michael, Your help's much appreciated. I've put together a script to help with testing my regexp - any flaws?
      #!/usr/bin/perl # imageregextest.plx # To remove an image tag: /<IMG\s+([^>]+)>/ig or /<IMG\s+(.*)>/ig # To remove anchor tag: /<[aA]\s+[hH][rR][eE][fF]=[^>]*>/ # Preamble: This program asks for a regular expression to be input, to + test for # a match to an HTML image tag. use warnings; use diagnostics; use strict; $_ = '<IMG SRC="C:\Perl\HTMLworkspace\HTML practice\My First Page!\fir +st.html\dicky.jpg" ALT="Dicky Mintos!"/> '; print "Enter a regular expression: "; my $pattern = <STDIN>; chomp($pattern); if(/$pattern/) { print "The text matches the pattern $pattern.\n"; } else { print "'$pattern' was not found\n"; }
      Experimenting with regexp is fun, though hard-work (I was diagnosed dyslexic in June!)! Learning to hack, albeit slowly... Cheers, Richard
Re: HTML tag extraction probs...
by benn (Vicar) on Aug 07, 2003 at 13:56 UTC
    The error means exactly what it says...you're using $i in scrapTag without declaring it. There are fixes for your existing code, but I'd probably rewrite the sub simply as a map - something like...
    @htmlLines = map {s/your_regex//ig;$_;} @htmlLines

    There's no need for a sub here at all - if you want to do many 'scrapTags', then by all means declare one, but you'll maybe want to pass in your '@htmlLines', rather than relying on a global.

    As for the regex...that's fine so long as your img tags don't have any ">" characters (say, <img src='next_page' alt='>'> <img src='last_page' alt='>>'>, which is something I tend to do a fair amount) - check out the many HTML parsing modules that are mentioned here 10 or 20 times a day :)

    Cheers, Ben.

Re: Global symbol probs...
by dragonchild (Archbishop) on Aug 07, 2003 at 14:29 UTC
    Given that you have strict turned on, I'd really like to see the version of Perl that the code you posted compiles with ... Fix those problems first, and your bug(s) will be more evident.

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.