how to match after the n-th occurance?

johnnywang has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I've got a page scraping script that needs to get a particular column (say the 13th) from a 20-column html table, there is no other difference between the columns, and I only want the first row. Is there an easy regex to do this? thanks.

my $text=<<END;
<TR>
<TD nowrap>4.2</TD> 
<TD nowrap>-1.2</TD> 
<TD nowrap>3.5</TD> 
<TD nowrap>6.2</TD> 
<TD nowrap>5</TD> 
<TD nowrap>2e-8</TD> 
<TD nowrap>1.3</TD> 
<TD nowrap>12.0</TD> 
<TD nowrap>text</TD> 
<TD nowrap>other</TD> 
<TD nowrap>-23</TD> 
<TD nowrap>2.3</TD> 
<TD nowrap>0.2</TD> 
<TD nowrap>1.4</TD> 
<TD nowrap>4</TD> 
<TD nowrap>6</TD> 
<TD nowrap>03</TD> 
<TD nowrap>2.3</TD>
<TD nowrap>e12</TD> 
<TD nowrap>4</TD>
<TR>
END
# want to get the number 0.2 which is in the 13th column
[download]

Comment on how to match after the n-th occurance? Download Code

Replies are listed 'Best First'.
Re: how to match after the n-th occurance? by japhy (Canon) on Jun 03, 2005 at 06:27 UTC
While I don't condone regexes on HTML, the general idea is: `($nth_chunk) = $text =~ /(?:PREFACE(MATCH)POSTFIX){N}/;` [download] In your case, that'd be: `my ($number13) = $text =~ /(?:<TD nowrap>(.?)</TD>\n){13}/;` [download] The reason it works is because a capturing group INSIDE a quantifier keeps on overwriting the associated $DIGIT variable on each repetition. Here's a simpler example: `"abcdef" =~ /(.)/; # $1 is "a" "abcdef" =~ /(.)+/; # $1 is "f"` [download] Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart*	[reply] [d/l] [select]
Re: how to match after the n-th occurance? by Zaxo (Archbishop) on Jun 03, 2005 at 06:36 UTC
You can do a global match for a single instance and index the result: `my $foo = ($text =~ m!<TD nowrap>([^<]*)</TD>!g)[12];` Indexing places the global match in list context. After Compline, Zaxo	[reply] [d/l]
Re: how to match after the n-th occurance? by tlm (Prior) on Jun 03, 2005 at 13:00 UTC
Accept `HTML::TableExtract` into your life. the lowliest monk	[reply]