NetWallah nailed it (if indeed, we understand what you're trying to acomplish) because parsing html with homebrew regexen is simply too easy to screw up.

IOW, use an appropriate module, which having stood the test of at least some time (and the terrors of CPAN's testing process) is more apt to be reliable than the one-off the newbie invents.

However, because maybe you really meant something like this?

#!/usr/bin/perl use strict; use warnings; # 888817 my @array; my $file = "888817.txt"; open FH, '<', $file or die "Can't open $file: $!"; # while ( $file ) { my @line = <FH>; for my $line(@line) { if ( $line =~ /^\n/ ) { next; } else { (my $found) = $line =~ m/<.[^>]*>/g; print "\$found: $found \n"; push @array, $found; } } for( my $i=0; $i<@array; $i++){ print "The Element $i is $array[$i]\n"; } for( my $j =0; $j<@array; $j++){ for( my $k=$j+1; $k<@array; $k++) { if( $array[$j] eq $array[$k]) { print "substring($array[$j],$array[$k])\n"; } } }

Where the data looked like this:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http:/ +/www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta name="DESCRIPTION" content="Abcdef Hose Co. #1 -- protecting the + Abcdef, New York" /> <link type="text/css" rel="stylesheet" href="NHC1.css" /> <link rel="shortcut icon" href="http://Abcdef.org/favicon.ico" /> <title>(ww)Abcdef Hose Company #2 - Home</title> </head> <body> <div id="title"> <span style="color: #cc0000; background-color: black;">Address: </span +> 26 New Avenue, Abcdef, NY &nbsp; <span style="color: #cc0000; backg +round-color: black;">phone: </span>nnn.nnn.nnnn</div> <address> Abcdef Hose Co. #1<br /> </address> <br /> <p style="color: black; background-color: transparent; line-height: 99 +%;">Do you have something that you would like to see on the website? +If so, let us know (use the email link below) and we will try to inco +rporate it.</p> <p><a href="mailto:ww@Abcdef.org"><img src="gfx/box.gif" alt="contact +webmaster" width="43" height="55" />Email webmaster</a></p> </div> <!-- end div left sidebar (lsb) --> <div id="main"> <div id="main_header" style="width:100%;"> <h1 style="text-align: center;">Abcdef Hose Company #1</h1> <img src="gfx/hoseco2.jpg" alt="Shoulder Patch: Abcdef Hose Company #2 +" width="235" height="255" hspace="250" /> </div> <!-- end main_header --> <div id="news"> <div style="text-align:left;" class="style1"><strong>Latest Hot Stuff. +..</strong></div> <p>Special Drill, 10am, Tuesday, 14 May: MA companies at Hoovertown Ma +ll</p> </div> <!-- end div news --> </div> <!-- end div main --> </body> </html>

producing this output:

$found: <?xml version="1.0" encoding="UTF-8"?> $found: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> $found: <html xmlns="http://www.w3.org/1999/xhtml"> $found: <head> $found: <meta http-equiv="Content-Type" content="text/html; charset=UT +F-8" /> $found: <meta name="DESCRIPTION" content="Abcdef Hose Co. #1 -- protec +ting the Abcdef, New York" /> $found: <link type="text/css" rel="stylesheet" href="NHC1.css" /> $found: <link rel="shortcut icon" href="http://Abcdef.org/favicon.ico" + /> $found: <title> $found: </head> $found: <body> $found: <div id="title"> $found: <span style="color: #cc0000; background-color: black;"> $found: <address> $found: <br /> $found: </address> $found: <br /> $found: <p style="color: black; background-color: transparent; line-he +ight: 99%;"> $found: <p> $found: </div> $found: <div id="main"> $found: <div id="main_header" style="width:100%;"> $found: <h1 style="text-align: center;"> $found: <img src="gfx/hoseco2.jpg" alt="Shoulder Patch: Abcdef Hose Co +mpany #2" width="235" height="255" hspace="250" /> $found: </div> $found: <div id="news"> $found: <div style="text-align:left;" class="style1"> $found: <p> $found: </div> $found: </div> $found: </body> $found: </html> The Element 0 is <?xml version="1.0" encoding="UTF-8"?> The Element 1 is <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transiti +onal//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> The Element 2 is <html xmlns="http://www.w3.org/1999/xhtml"> The Element 3 is <head> The Element 4 is <meta http-equiv="Content-Type" content="text/html; c +harset=UTF-8" /> The Element 5 is <meta name="DESCRIPTION" content="Abcdef Hose Co. #1 +-- protecting the Abcdef, New York" /> The Element 6 is <link type="text/css" rel="stylesheet" href="NHC1.css +" /> The Element 7 is <link rel="shortcut icon" href="http://Abcdef.org/fav +icon.ico" /> The Element 8 is <title> The Element 9 is </head> The Element 10 is <body> The Element 11 is <div id="title"> The Element 12 is <span style="color: #cc0000; background-color: black +;"> The Element 13 is <address> The Element 14 is <br /> The Element 15 is </address> The Element 16 is <br /> The Element 17 is <p style="color: black; background-color: transparen +t; line-height: 99%;"> The Element 18 is <p> The Element 19 is </div> The Element 20 is <div id="main"> The Element 21 is <div id="main_header" style="width:100%;"> The Element 22 is <h1 style="text-align: center;"> The Element 23 is <img src="gfx/hoseco2.jpg" alt="Shoulder Patch: Abcd +ef Hose Company #2" width="235" height="255" hspace="250" /> The Element 24 is </div> The Element 25 is <div id="news"> The Element 26 is <div style="text-align:left;" class="style1"> The Element 27 is <p> The Element 28 is </div> The Element 29 is </div> The Element 30 is </body> The Element 31 is </html> substring(<br />,<br />) substring(<p>,<p>) substring(</div>,</div>) substring(</div>,</div>) substring(</div>,</div>) substring(</div>,</div>) substring(</div>,</div>) substring(</div>,</div>)

....(in which, I still see no rhyme nor reason, but as they say: diff'rent strokes for diff'rent folks).


In reply to Re: extract string between 2 elements by ww
in thread extract string between 2 elements by satishchandra

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.