in reply to getting and printing form values etc from html stripping out all else
For your first requirement, a regex is probably safe and effective, since (unless I'm having a Sr. moment) the html 4.x standard does not allow an image tag with a literal ">" inside the tag.
One way to approach the job, therefore, is to extend your regex with less-greedy (aka "minimally greedy") matching and a lookahead. Here's a sketch, minus file-handling, CGI, etc:
#!/usr/bin/perl use strict; use warnings; #825146 my @line = <DATA>; for my $line(@line) { chomp $line; if ( $line =~ m/(<img .*?[^>]+)/ ){ print "<g:image_link> " . $1 . "> </g:image_link>\n"; } else { print "\t nope: $line \n"; # you may want to send this to a di +fferent file } } __DATA__ <p><img src="http://www.mysite/graphics/blue.jpg" alt="Hey" width="100 +" height="100" ><br>yada yada</p> <p><img src="../grapics/blue1.gif" alt="Yo" width="200" height="75"></ +p> <p>foobar with no img</p> <blockquote><img width="75" height="75" src="blue2.png"></blockquote>
Output:
<g:image_link> <img src="http://www.mysite/graphics/blue.jpg" alt="Hey +" width="100" height="100" > </g:image_link> <g:image_link> <img src="../grapics/blue1.gif" alt="Yo" width="200" he +ight="75"> </g:image_link> nope: <p>foobar with no img</p> <g:image_link> <img width="75" height="75" src="blue2.png"> </g:image_ +link>
BUT take the advice from pemungkah above: Use a parser! Trying to deal with all the possible unwanted tags in a form with regexen is going to get you deeper and deeper into complexities.
And if you're planning to read user input from a form, for heaven's sake, read about untainting. You really don't want to let the fumble-fingered or malicious run around loose in your playground.
|
|---|