Here's one way (using basicly the method you describe in the last para above (+ +), short-circuited to skip any <para id="n"> unless "n" is "2"):
#!/usr/bin/perl use strict; use warnings; use 5.012; # 918508 wants linknames from para 2 only my @data = ('<p id=paragraph_1>', '<a href="http://www.link1.com">Link1</a>', '<a href="http://www.link2.com">Link2</a>', '<a href="http://www.link3.com">Link3</a>', '</p>', '<p id=paragraph_2>', '<a href="http://www.link4.com">Link4</a>', '<a href="http://www.link5.com">Link5</a>', '<a href="http://www.link6.com">Link6</a>', '</p>',); my (@linkName, $linkName); my $flag = 0; for my $data(@data) { chomp $data; if ( $data =~ /<p id=paragraph_2>/ ) { # when the above is true, we've found para 2 $flag = 1; } if ( $flag && ($data !~ /<p id=paragraph_2>/) ) { # now, we want to skip the data --the para heading -- # the first time we arrive here, but if it's not the h +eading, # then capture the link title if ( $data =~ m/<a href=.+>(Link\d)<.+/ ) { $linkName = $1; push @linkName, $linkName; } } } for my $extracted(@linkName) { say $extracted; }
Output:
Link4 Link5 Link6
Generalization -- to suit specific needs -- is left as an exercise for the OP.
And BTW, html attribute values should be inside quotes... <p id="para1" class="b">.Update: Para 1 extended to note that OP was on the right track with the method in the last para of his last previous post.
In reply to Re^7: regexp over multiple lines
by ww
in thread regexp over multiple lines
by liverpaul
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |