Re^7: regexp over multiple lines

Here's one way (using basicly the method you describe in the last para above (+ +), short-circuited to skip any <para id="n"> unless "n" is "2"):

#!/usr/bin/perl
use strict;
use warnings;
use 5.012;

# 918508 wants linknames from para 2 only

my @data =
    ('<p id=paragraph_1>',
    '<a href="http://www.link1.com">Link1</a>', 
    '<a href="http://www.link2.com">Link2</a>',
    '<a href="http://www.link3.com">Link3</a>', 
    '</p>',
    '<p id=paragraph_2>',
    '<a href="http://www.link4.com">Link4</a>', 
    '<a href="http://www.link5.com">Link5</a>', 
    '<a href="http://www.link6.com">Link6</a>',
    '</p>',);

my (@linkName, $linkName);
my $flag = 0;
for my $data(@data) {
    chomp $data;
    if ( $data =~ /<p id=paragraph_2>/ ) {
                # when the above is true, we've found para 2
        $flag = 1;
    } 
    
    if ( $flag && ($data !~ /<p id=paragraph_2>/) ) {
                # now, we want to skip the data --the para heading -- 
                # the first time we arrive here, but if it's not the h
+eading, 
                # then capture the link title
        if ( $data =~ m/<a href=.+>(Link\d)<.+/ ) {
            $linkName = $1;
            push @linkName, $linkName;
        }
    }
}

for my $extracted(@linkName) {
    say $extracted;
}
[download]

Output:

Link4
Link5
Link6
[download]

Generalization -- to suit specific needs -- is left as an exercise for the OP.

And BTW, html attribute values should be inside quotes... <p id="para1" class="b">.

Update: Para 1 extended to note that OP was on the right track with the method in the last para of his last previous post.

Comment on Re^7: regexp over multiple lines Select or Download Code

Replies are listed 'Best First'.
Re^8: regexp over multiple lines by liverpaul (Acolyte) on Aug 05, 2011 at 10:24 UTC
Thanks for the detailed reply :-) Although I can follow what you are doing in your code example, I'm a bit confused about one thing. You seem to be processing the @data array line by line, whereas I'm reading my data into a single string variable (slurping, as recommended by the forum). Please correct me if I've misunderstood :-) Where you have: `my @data = ('<p id=paragraph_1>', '<a href="http://www.link1.com">Link1</a>', '<a href="http://www.link2.com">Link2</a>', '<a href="http://www.link3.com">Link3</a>', '</p>', '<p id=paragraph_2>', '<a href="http://www.link4.com">Link4</a>', '<a href="http://www.link5.com">Link5</a>', '<a href="http://www.link6.com">Link6</a>', '</p>',);` [download] I seem to have: `my $myData = <WEB_DATA>;` While googling, I came across the pos() function, which is used to find the offset or position of the last matched substring. Maybe this could help me when slurping files?	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^8: regexp over multiple lines
by liverpaul (Acolyte) on Aug 05, 2011 at 10:24 UTC

Thanks for the detailed reply :-)

Although I can follow what you are doing in your code example, I'm a bit confused about one thing. You seem to be processing the @data array line by line, whereas I'm reading my data into a single string variable (slurping, as recommended by the forum). Please correct me if I've misunderstood :-)

Where you have:

my @data =
    ('<p id=paragraph_1>',
    '<a href="http://www.link1.com">Link1</a>', 
    '<a href="http://www.link2.com">Link2</a>',
    '<a href="http://www.link3.com">Link3</a>', 
    '</p>',
    '<p id=paragraph_2>',
    '<a href="http://www.link4.com">Link4</a>', 
    '<a href="http://www.link5.com">Link5</a>', 
    '<a href="http://www.link6.com">Link6</a>',
    '</p>',);
[download]

I seem to have:

my $myData = <WEB_DATA>;

While googling, I came across the pos() function, which is used to find the offset or position of the last matched substring. Maybe this could help me when slurping files?

[reply]
[d/l]
[select]