IOW, use an appropriate module, which having stood the test of at least some time (and the terrors of CPAN's testing process) is more apt to be reliable than the one-off the newbie invents.
However, because maybe you really meant something like this?
#!/usr/bin/perl use strict; use warnings; # 888817 my @array; my $file = "888817.txt"; open FH, '<', $file or die "Can't open $file: $!"; # while ( $file ) { my @line = <FH>; for my $line(@line) { if ( $line =~ /^\n/ ) { next; } else { (my $found) = $line =~ m/<.[^>]*>/g; print "\$found: $found \n"; push @array, $found; } } for( my $i=0; $i<@array; $i++){ print "The Element $i is $array[$i]\n"; } for( my $j =0; $j<@array; $j++){ for( my $k=$j+1; $k<@array; $k++) { if( $array[$j] eq $array[$k]) { print "substring($array[$j],$array[$k])\n"; } } }
Where the data looked like this:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http:/ +/www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta name="DESCRIPTION" content="Abcdef Hose Co. #1 -- protecting the + Abcdef, New York" /> <link type="text/css" rel="stylesheet" href="NHC1.css" /> <link rel="shortcut icon" href="http://Abcdef.org/favicon.ico" /> <title>(ww)Abcdef Hose Company #2 - Home</title> </head> <body> <div id="title"> <span style="color: #cc0000; background-color: black;">Address: </span +> 26 New Avenue, Abcdef, NY <span style="color: #cc0000; backg +round-color: black;">phone: </span>nnn.nnn.nnnn</div> <address> Abcdef Hose Co. #1<br /> </address> <br /> <p style="color: black; background-color: transparent; line-height: 99 +%;">Do you have something that you would like to see on the website? +If so, let us know (use the email link below) and we will try to inco +rporate it.</p> <p><a href="mailto:ww@Abcdef.org"><img src="gfx/box.gif" alt="contact +webmaster" width="43" height="55" />Email webmaster</a></p> </div> <!-- end div left sidebar (lsb) --> <div id="main"> <div id="main_header" style="width:100%;"> <h1 style="text-align: center;">Abcdef Hose Company #1</h1> <img src="gfx/hoseco2.jpg" alt="Shoulder Patch: Abcdef Hose Company #2 +" width="235" height="255" hspace="250" /> </div> <!-- end main_header --> <div id="news"> <div style="text-align:left;" class="style1"><strong>Latest Hot Stuff. +..</strong></div> <p>Special Drill, 10am, Tuesday, 14 May: MA companies at Hoovertown Ma +ll</p> </div> <!-- end div news --> </div> <!-- end div main --> </body> </html>
producing this output:
$found: <?xml version="1.0" encoding="UTF-8"?> $found: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> $found: <html xmlns="http://www.w3.org/1999/xhtml"> $found: <head> $found: <meta http-equiv="Content-Type" content="text/html; charset=UT +F-8" /> $found: <meta name="DESCRIPTION" content="Abcdef Hose Co. #1 -- protec +ting the Abcdef, New York" /> $found: <link type="text/css" rel="stylesheet" href="NHC1.css" /> $found: <link rel="shortcut icon" href="http://Abcdef.org/favicon.ico" + /> $found: <title> $found: </head> $found: <body> $found: <div id="title"> $found: <span style="color: #cc0000; background-color: black;"> $found: <address> $found: <br /> $found: </address> $found: <br /> $found: <p style="color: black; background-color: transparent; line-he +ight: 99%;"> $found: <p> $found: </div> $found: <div id="main"> $found: <div id="main_header" style="width:100%;"> $found: <h1 style="text-align: center;"> $found: <img src="gfx/hoseco2.jpg" alt="Shoulder Patch: Abcdef Hose Co +mpany #2" width="235" height="255" hspace="250" /> $found: </div> $found: <div id="news"> $found: <div style="text-align:left;" class="style1"> $found: <p> $found: </div> $found: </div> $found: </body> $found: </html> The Element 0 is <?xml version="1.0" encoding="UTF-8"?> The Element 1 is <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transiti +onal//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> The Element 2 is <html xmlns="http://www.w3.org/1999/xhtml"> The Element 3 is <head> The Element 4 is <meta http-equiv="Content-Type" content="text/html; c +harset=UTF-8" /> The Element 5 is <meta name="DESCRIPTION" content="Abcdef Hose Co. #1 +-- protecting the Abcdef, New York" /> The Element 6 is <link type="text/css" rel="stylesheet" href="NHC1.css +" /> The Element 7 is <link rel="shortcut icon" href="http://Abcdef.org/fav +icon.ico" /> The Element 8 is <title> The Element 9 is </head> The Element 10 is <body> The Element 11 is <div id="title"> The Element 12 is <span style="color: #cc0000; background-color: black +;"> The Element 13 is <address> The Element 14 is <br /> The Element 15 is </address> The Element 16 is <br /> The Element 17 is <p style="color: black; background-color: transparen +t; line-height: 99%;"> The Element 18 is <p> The Element 19 is </div> The Element 20 is <div id="main"> The Element 21 is <div id="main_header" style="width:100%;"> The Element 22 is <h1 style="text-align: center;"> The Element 23 is <img src="gfx/hoseco2.jpg" alt="Shoulder Patch: Abcd +ef Hose Company #2" width="235" height="255" hspace="250" /> The Element 24 is </div> The Element 25 is <div id="news"> The Element 26 is <div style="text-align:left;" class="style1"> The Element 27 is <p> The Element 28 is </div> The Element 29 is </div> The Element 30 is </body> The Element 31 is </html> substring(<br />,<br />) substring(<p>,<p>) substring(</div>,</div>) substring(</div>,</div>) substring(</div>,</div>) substring(</div>,</div>) substring(</div>,</div>) substring(</div>,</div>)
....(in which, I still see no rhyme nor reason, but as they say: diff'rent strokes for diff'rent folks).
In reply to Re: extract string between 2 elements
by ww
in thread extract string between 2 elements
by satishchandra
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |