in reply to Re^6: regexp over multiple lines
in thread regexp over multiple lines

Here's one way (using basicly the method you describe in the last para above (+ +), short-circuited to skip any <para id="n"> unless "n" is "2"):

#!/usr/bin/perl use strict; use warnings; use 5.012; # 918508 wants linknames from para 2 only my @data = ('<p id=paragraph_1>', '<a href="http://www.link1.com">Link1</a>', '<a href="http://www.link2.com">Link2</a>', '<a href="http://www.link3.com">Link3</a>', '</p>', '<p id=paragraph_2>', '<a href="http://www.link4.com">Link4</a>', '<a href="http://www.link5.com">Link5</a>', '<a href="http://www.link6.com">Link6</a>', '</p>',); my (@linkName, $linkName); my $flag = 0; for my $data(@data) { chomp $data; if ( $data =~ /<p id=paragraph_2>/ ) { # when the above is true, we've found para 2 $flag = 1; } if ( $flag && ($data !~ /<p id=paragraph_2>/) ) { # now, we want to skip the data --the para heading -- # the first time we arrive here, but if it's not the h +eading, # then capture the link title if ( $data =~ m/<a href=.+>(Link\d)<.+/ ) { $linkName = $1; push @linkName, $linkName; } } } for my $extracted(@linkName) { say $extracted; }

Output:

Link4 Link5 Link6

Generalization -- to suit specific needs -- is left as an exercise for the OP.

And BTW, html attribute values should be inside quotes... <p id="para1" class="b">.

Update: Para 1 extended to note that OP was on the right track with the method in the last para of his last previous post.

Replies are listed 'Best First'.
Re^8: regexp over multiple lines
by liverpaul (Acolyte) on Aug 05, 2011 at 10:24 UTC

    Thanks for the detailed reply :-)

    Although I can follow what you are doing in your code example, I'm a bit confused about one thing. You seem to be processing the @data array line by line, whereas I'm reading my data into a single string variable (slurping, as recommended by the forum). Please correct me if I've misunderstood :-)

    Where you have:

    my @data = ('<p id=paragraph_1>', '<a href="http://www.link1.com">Link1</a>', '<a href="http://www.link2.com">Link2</a>', '<a href="http://www.link3.com">Link3</a>', '</p>', '<p id=paragraph_2>', '<a href="http://www.link4.com">Link4</a>', '<a href="http://www.link5.com">Link5</a>', '<a href="http://www.link6.com">Link6</a>', '</p>',);

    I seem to have:

    my $myData = <WEB_DATA>;

    While googling, I came across the pos() function, which is used to find the offset or position of the last matched substring. Maybe this could help me when slurping files?