in reply to Re^6: regexp over multiple lines
in thread regexp over multiple lines
Here's one way (using basicly the method you describe in the last para above (+ +), short-circuited to skip any <para id="n"> unless "n" is "2"):
#!/usr/bin/perl use strict; use warnings; use 5.012; # 918508 wants linknames from para 2 only my @data = ('<p id=paragraph_1>', '<a href="http://www.link1.com">Link1</a>', '<a href="http://www.link2.com">Link2</a>', '<a href="http://www.link3.com">Link3</a>', '</p>', '<p id=paragraph_2>', '<a href="http://www.link4.com">Link4</a>', '<a href="http://www.link5.com">Link5</a>', '<a href="http://www.link6.com">Link6</a>', '</p>',); my (@linkName, $linkName); my $flag = 0; for my $data(@data) { chomp $data; if ( $data =~ /<p id=paragraph_2>/ ) { # when the above is true, we've found para 2 $flag = 1; } if ( $flag && ($data !~ /<p id=paragraph_2>/) ) { # now, we want to skip the data --the para heading -- # the first time we arrive here, but if it's not the h +eading, # then capture the link title if ( $data =~ m/<a href=.+>(Link\d)<.+/ ) { $linkName = $1; push @linkName, $linkName; } } } for my $extracted(@linkName) { say $extracted; }
Output:
Link4 Link5 Link6
Generalization -- to suit specific needs -- is left as an exercise for the OP.
And BTW, html attribute values should be inside quotes... <p id="para1" class="b">.Update: Para 1 extended to note that OP was on the right track with the method in the last para of his last previous post.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^8: regexp over multiple lines
by liverpaul (Acolyte) on Aug 05, 2011 at 10:24 UTC |