Re: Global symbol probs...

Without commenting on the general technique used to extract data from html:

sub scrapTag  # removes image tags from HTML document
{
  while($htmlLines[$i] =~ m/<IMG\s+([^>]+)>/ig)
[download]

I think you want something like this here:

sub scrapTag {
    foreach my $line (@htmlLines) {
       # replace <IMG ...> with nothing.
       $line =~ s/<IMG\s+([^>]+)>//ig;
    }
}
[download]

which will walk the list of lines and execute the substitution for each line. Note - I made no effort to code a correct regexp to achieve the desired results

Michael

Comment on Re: Global symbol probs... Select or Download Code

Replies are listed 'Best First'.
Re: Re: Global symbol probs... by Anonymous Monk on Aug 07, 2003 at 16:08 UTC
Hello Michael, Your help's much appreciated. I've put together a script to help with testing my regexp - any flaws? #!/usr/bin/perl # imageregextest.plx # To remove an image tag: /<IMG\s+([^>]+)>/ig or /<IMG\s+(.)>/ig # To remove anchor tag: /<[aA]\s+[hH][rR][eE][fF]=[^>]>/ # Preamble: This program asks for a regular expression to be input, to + test for # a match to an HTML image tag. use warnings; use diagnostics; use strict; $_ = '<IMG SRC="C:\Perl\HTMLworkspace\HTML practice\My First Page!\fir +st.html\dicky.jpg" ALT="Dicky Mintos!"/> '; print "Enter a regular expression: "; my $pattern = <STDIN>; chomp($pattern); if(/$pattern/) { print "The text matches the pattern $pattern.\n"; } else { print "'$pattern' was not found\n"; } [download] Experimenting with regexp is fun, though hard-work (I was diagnosed dyslexic in June!)! Learning to hack, albeit slowly... Cheers, Richard	[reply] [d/l]

Replies are listed 'Best First'.

Re: Re: Global symbol probs...
by Anonymous Monk on Aug 07, 2003 at 16:08 UTC

#!/usr/bin/perl
# imageregextest.plx
# To remove an image tag: /<IMG\s+([^>]+)>/ig or /<IMG\s+(.*)>/ig
# To remove anchor tag: /<[aA]\s+[hH][rR][eE][fF]=[^>]*>/
# Preamble: This program asks for a regular expression to be input, to
+ test for
# a match to an HTML image tag.
use warnings;
use diagnostics;
use strict;

$_ = '<IMG SRC="C:\Perl\HTMLworkspace\HTML practice\My First Page!\fir
+st.html\dicky.jpg" ALT="Dicky Mintos!"/> ';

print "Enter a regular expression: ";
my $pattern = <STDIN>;
chomp($pattern);

if(/$pattern/)
{
  print "The text matches the pattern $pattern.\n";
}
else
{
  print "'$pattern' was not found\n";
}
[download]

[reply]
[d/l]