perlmuser has asked for the wisdom of the Perl Monks concerning the following question:

I have a simple HTML file .. the contents off which i have included below : ###############
<table class="gridtable" summary="RegTable"> <tr><th>Address</th><th>Register</th><th>7</th><th>6</th><th>5</th><th +>4</th><th>3</th><th>2</th><th>1</th><th>0</th><th>Reset</th><th>Desc +ription</th></tr> <tr><td>0x00000001</td><td><a href="#RegisterMap:REG0000">REG0000</a>< +/td><td align=center colspan=6> TEMP </td><td align=center > STOP </t +d><td align=center > START </td><td>'h14</td><td>TEMPORARY REG.</td>< +/tr> </table> <table class="gridtable" summary="RegTable"> <tr><th>Address</th><th>Register</th><th>15</th><th>14</th><th>13</th> +<th>12</th><th>11</th><th>10</th><th>9</th><th>8</th><th>7</th><th>6< +/th><th>5</th><th>4</th><th>3</th><th>2</th><th>1</th><th>0</th><th>R +eset</th><th>Description</th></tr> <tr><td>0x00000100</td><td><a href="#FuseMap:FUSE0">FUSE0</a></td><td +align=center colspan=8> F_1 </td><td align=center colspan=8> F_0 </td +><td>'h0000</td><td>FUSE0.</td></tr> </table>
########### I basically has two tables I wrote the following perl script to extract the table based on a header match: ###########
use HTML::TableExtract; my $file = 'temp.html'; @headers = qw( Address Register 15 14 13 12 11 10 9 8 7 6 5); print " \n h:@headers:\n"; $te = new HTML::TableExtract( keep_html=>1,headers => \@headers); $te->parse_file($file); @tcount1 = $te->counts(0); print " tcount1 : @tcount1:\n";
######## Basically i could like to extract the second table, but for some reasons the extraction does not seem to work .. If however i remove the last entry in the header list i.e. if i have the header as just
@headers = qw( Address Register 15 14 13 12 11 10 9 8 7 6);
It works fine .. but with the header as :
@headers = qw( Address Register 15 14 13 12 11 10 9 8 7 6 5);
It does not work .. Not sure if i have done something wrong here .. but can someone help me out .. I could like to have the header as
@headers = qw( Address Register 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +);
for some reasons and not a truncated header .. Any suggestions could be great .. Perl version 5.14.2

Replies are listed 'Best First'.
Re: Table Extract Header Match
by AppleFritter (Vicar) on Jul 09, 2014 at 10:19 UTC

    I've spent far too much time (that is to say, almost an hour) trying to understand this, and I'm convinced that this indicates a bug in HTML::TableExtract now. Here's my current testing script:

    use HTML::TableExtract; use feature qw/say/; use strict; use warnings; my $file = 'temp.html'; my @headers1 = qw( Address Register 15 14 13 12 11 10 9 8 7 6 5 ); my @headers2 = qw( Address Register 15 14 13 12 11 10 9 8 7 6 ); sub try_match { my ($headers_ref) = @_; my $te = new HTML::TableExtract( debug => 5, headers => $headers_ref, ); $te->parse_file($file); my @tcount1 = $te->counts(0); say "headers = @$headers_ref"; say "tcount1 = @tcount1"; foreach my $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach my $row ($ts->rows) { print join(',', @$row), "\n"; } } say "-" x 79; } try_match(\@headers1); try_match(\@headers2);

    And my current temp.html (based on yours, but formatted nicely and with slightly different data, to rule out colspan attributes as part of the issue):

    <table class="gridtable" summary="RegTable"> <tr> <th>Address</th> <th>Register</th> <th>7</th> <th>6</th> <th>5</th> <th>4</th> <th>3</th> <th>2</th> <th>1</th> <th>0</th> <th>Reset</th> <th>Description</th> </tr> <tr> <td>0x00000001</td> <td><a href="#RegisterMap:REG0000">REG0000</a></td> <td align=center colspan=6> TEMP </td> <td align=center > STOP </td> <td align=center > START </td> <td>'h14</td> <td>TEMPORARY REG.</td> </tr> </table> <table class="gridtable" summary="RegTable"> <tr> <th>Address</th> <th>Register</th> <th>15</th> <th>14</th> <th>13</th> <th>12</th> <th>11</th> <th>10</th> <th>9</th> <th>8</th> <th>7</th> <th>6</th> <th>5</th> <th>4</th> <th>3</th> <th>2</th> <th>1</th> <th>0</th> <th>Reset</th> <th>Description</th> </tr> <tr> <td>Address</td> <td>Register</td> <td>15</td> <td>14</td> <td>13</td> <td>12</td> <td>11</td> <td>10</td> <td>9</td> <td>8</td> <td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td> <td>1</td> <td>0</td> <td>Reset</td> <td>Description</td> </tr> </table>

    And here's the script's output:

    $ perl 1092870.pl TE here, headers: Address,Register,15,14,13,12,11,10,9,8,7,6,5 TABLE: cdepth 0, ccount 0, it: 1 Flesh row 0 (11) to 11 111111111111 Flesh row 1 (6) to 11 111000001111 HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)| +(12)|(11)|(10)))/ attempt match on Address ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)| +(5)|(15)|(14)|(13)|(12)|(11)|(10)))): (Address) HPAT: /(?^mi:((Register)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11)| +(10)))/ HIT on 'Address' (Address) in Address (0,0) attempt match on Register ((?^mi:((Register)|(9)|(8)|(7)|(6)|(5)|(15)| +(14)|(13)|(12)|(11)|(10)))): (Register) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on 'Register' (Register) in Register (0,1) attempt match on 7 ((?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11 +)|(10)))): (7) HPAT: /(?^mi:((9)|(8)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on '7' (7) in 7 (0,2) attempt match on 6 ((?^mi:((9)|(8)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|(1 +0)))): (6) HPAT: /(?^mi:((9)|(8)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on '6' (6) in 6 (0,3) attempt match on 5 ((?^mi:((9)|(8)|(5)|(15)|(14)|(13)|(12)|(11)|(10))) +): (5) HPAT: /(?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on '5' (5) in 5 (0,4) attempt match on 4 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on 3 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on 2 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on 1 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on 0 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on Reset ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10))) +): 0 attempt match on Description ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)| +(10)))): 0 Incomplete header match (left: 10, 11, 12, 13, 14, 15, 8, 9) in row 0, + resetting scan HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)| +(12)|(11)|(10)))/ attempt match on 0x00000001 ((?^mi:((Register)|(Address)|(9)|(8)|(7)|( +6)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on REG0000 ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)| +(5)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on TEMP ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|( +5)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15 +)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15 +)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15 +)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15 +)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15 +)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on STOP ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|( +5)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on START ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)| +(5)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on 'h14 ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5) +|(15)|(14)|(13)|(12)|(11)|(10)))): (14) HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15)|(13)|(12)| +(11)|(10)))/ HIT on '14' (14) in 'h14 (1,10) attempt match on TEMPORARY REG. ((?^mi:((Register)|(Address)|(9)|(8)|( +7)|(6)|(5)|(15)|(13)|(12)|(11)|(10)))): 0 Incomplete header match (left: 10, 11, 12, 13, 15, 5, 6, 7, 8, 9, Addr +ess, Register) in row 1, resetting scan LEAVE: cdepth: -1, ccount: 0, it: 0 TABLE: cdepth 0, ccount 1, it: 1 Flesh row 0 (19) to 19 11111111111111111111 Flesh row 1 (19) to 19 11111111111111111111 HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)| +(12)|(11)|(10)))/ attempt match on Address ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)| +(5)|(15)|(14)|(13)|(12)|(11)|(10)))): (Address) HPAT: /(?^mi:((Register)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11)| +(10)))/ HIT on 'Address' (Address) in Address (0,0) attempt match on Register ((?^mi:((Register)|(9)|(8)|(7)|(6)|(5)|(15)| +(14)|(13)|(12)|(11)|(10)))): (Register) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on 'Register' (Register) in Register (0,1) attempt match on 15 ((?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(1 +1)|(10)))): (15) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on '15' (5) in 15 (0,2) attempt match on 14 ((?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|( +10)))): (14) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(13)|(12)|(11)|(10)))/ HIT on '14' (14) in 14 (0,3) attempt match on 13 ((?^mi:((9)|(8)|(7)|(6)|(15)|(13)|(12)|(11)|(10))) +): (13) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(12)|(11)|(10)))/ HIT on '13' (13) in 13 (0,4) attempt match on 12 ((?^mi:((9)|(8)|(7)|(6)|(15)|(12)|(11)|(10)))): (1 +2) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(11)|(10)))/ HIT on '12' (12) in 12 (0,5) attempt match on 11 ((?^mi:((9)|(8)|(7)|(6)|(15)|(11)|(10)))): (11) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(10)))/ HIT on '11' (11) in 11 (0,6) attempt match on 10 ((?^mi:((9)|(8)|(7)|(6)|(15)|(10)))): (10) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)))/ HIT on '10' (10) in 10 (0,7) attempt match on 9 ((?^mi:((9)|(8)|(7)|(6)|(15)))): (9) HPAT: /(?^mi:((8)|(7)|(6)|(15)))/ HIT on '9' (9) in 9 (0,8) attempt match on 8 ((?^mi:((8)|(7)|(6)|(15)))): (8) HPAT: /(?^mi:((7)|(6)|(15)))/ HIT on '8' (8) in 8 (0,9) attempt match on 7 ((?^mi:((7)|(6)|(15)))): (7) HPAT: /(?^mi:((6)|(15)))/ HIT on '7' (7) in 7 (0,10) attempt match on 6 ((?^mi:((6)|(15)))): (6) HPAT: /(?^mi:((15)))/ HIT on '6' (6) in 6 (0,11) attempt match on 5 ((?^mi:((15)))): 0 attempt match on 4 ((?^mi:((15)))): 0 attempt match on 3 ((?^mi:((15)))): 0 attempt match on 2 ((?^mi:((15)))): 0 attempt match on 1 ((?^mi:((15)))): 0 attempt match on 0 ((?^mi:((15)))): 0 attempt match on Reset ((?^mi:((15)))): 0 attempt match on Description ((?^mi:((15)))): 0 Incomplete header match (left: 15) in row 0, resetting scan HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)| +(12)|(11)|(10)))/ attempt match on Address ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)| +(5)|(15)|(14)|(13)|(12)|(11)|(10)))): (Address) HPAT: /(?^mi:((Register)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11)| +(10)))/ HIT on 'Address' (Address) in Address (1,0) attempt match on Register ((?^mi:((Register)|(9)|(8)|(7)|(6)|(5)|(15)| +(14)|(13)|(12)|(11)|(10)))): (Register) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on 'Register' (Register) in Register (1,1) attempt match on 15 ((?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(1 +1)|(10)))): (15) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on '15' (5) in 15 (1,2) attempt match on 14 ((?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|( +10)))): (14) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(13)|(12)|(11)|(10)))/ HIT on '14' (14) in 14 (1,3) attempt match on 13 ((?^mi:((9)|(8)|(7)|(6)|(15)|(13)|(12)|(11)|(10))) +): (13) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(12)|(11)|(10)))/ HIT on '13' (13) in 13 (1,4) attempt match on 12 ((?^mi:((9)|(8)|(7)|(6)|(15)|(12)|(11)|(10)))): (1 +2) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(11)|(10)))/ HIT on '12' (12) in 12 (1,5) attempt match on 11 ((?^mi:((9)|(8)|(7)|(6)|(15)|(11)|(10)))): (11) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(10)))/ HIT on '11' (11) in 11 (1,6) attempt match on 10 ((?^mi:((9)|(8)|(7)|(6)|(15)|(10)))): (10) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)))/ HIT on '10' (10) in 10 (1,7) attempt match on 9 ((?^mi:((9)|(8)|(7)|(6)|(15)))): (9) HPAT: /(?^mi:((8)|(7)|(6)|(15)))/ HIT on '9' (9) in 9 (1,8) attempt match on 8 ((?^mi:((8)|(7)|(6)|(15)))): (8) HPAT: /(?^mi:((7)|(6)|(15)))/ HIT on '8' (8) in 8 (1,9) attempt match on 7 ((?^mi:((7)|(6)|(15)))): (7) HPAT: /(?^mi:((6)|(15)))/ HIT on '7' (7) in 7 (1,10) attempt match on 6 ((?^mi:((6)|(15)))): (6) HPAT: /(?^mi:((15)))/ HIT on '6' (6) in 6 (1,11) attempt match on 5 ((?^mi:((15)))): 0 attempt match on 4 ((?^mi:((15)))): 0 attempt match on 3 ((?^mi:((15)))): 0 attempt match on 2 ((?^mi:((15)))): 0 attempt match on 1 ((?^mi:((15)))): 0 attempt match on 0 ((?^mi:((15)))): 0 attempt match on Reset ((?^mi:((15)))): 0 attempt match on Description ((?^mi:((15)))): 0 Incomplete header match (left: 15) in row 1, resetting scan LEAVE: cdepth: -1, ccount: 1, it: 0 headers = Address Register 15 14 13 12 11 10 9 8 7 6 5 tcount1 = ---------------------------------------------------------------------- +--------- TE here, headers: Address,Register,15,14,13,12,11,10,9,8,7,6 TABLE: cdepth 0, ccount 0, it: 1 Flesh row 0 (11) to 11 111111111111 Flesh row 1 (6) to 11 111000001111 HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(14)|(13)|(12) +|(11)|(10)))/ attempt match on Address ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)| +(15)|(14)|(13)|(12)|(11)|(10)))): (Address) HPAT: /(?^mi:((Register)|(9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10) +))/ HIT on 'Address' (Address) in Address (0,0) attempt match on Register ((?^mi:((Register)|(9)|(8)|(7)|(6)|(15)|(14) +|(13)|(12)|(11)|(10)))): (Register) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on 'Register' (Register) in Register (0,1) attempt match on 7 ((?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(1 +0)))): (7) HPAT: /(?^mi:((9)|(8)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on '7' (7) in 7 (0,2) attempt match on 6 ((?^mi:((9)|(8)|(6)|(15)|(14)|(13)|(12)|(11)|(10))) +): (6) HPAT: /(?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on '6' (6) in 6 (0,3) attempt match on 5 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on 4 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on 3 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on 2 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on 1 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on 0 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on Reset ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10))) +): 0 attempt match on Description ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)| +(10)))): 0 Incomplete header match (left: 10, 11, 12, 13, 14, 15, 8, 9) in row 0, + resetting scan HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(14)|(13)|(12) +|(11)|(10)))/ attempt match on 0x00000001 ((?^mi:((Register)|(Address)|(9)|(8)|(7)|( +6)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on REG0000 ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)| +(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on TEMP ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|( +15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(1 +4)|(13)|(12)|(11)|(10)))): 0 attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(1 +4)|(13)|(12)|(11)|(10)))): 0 attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(1 +4)|(13)|(12)|(11)|(10)))): 0 attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(1 +4)|(13)|(12)|(11)|(10)))): 0 attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(1 +4)|(13)|(12)|(11)|(10)))): 0 attempt match on STOP ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|( +15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on START ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)| +(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on 'h14 ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15 +)|(14)|(13)|(12)|(11)|(10)))): (14) HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(13)|(12)|(11) +|(10)))/ HIT on '14' (14) in 'h14 (1,10) attempt match on TEMPORARY REG. ((?^mi:((Register)|(Address)|(9)|(8)|( +7)|(6)|(15)|(13)|(12)|(11)|(10)))): 0 Incomplete header match (left: 10, 11, 12, 13, 15, 6, 7, 8, 9, Address +, Register) in row 1, resetting scan LEAVE: cdepth: -1, ccount: 0, it: 0 TABLE: cdepth 0, ccount 1, it: 1 Flesh row 0 (19) to 19 11111111111111111111 Flesh row 1 (19) to 19 11111111111111111111 HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(14)|(13)|(12) +|(11)|(10)))/ attempt match on Address ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)| +(15)|(14)|(13)|(12)|(11)|(10)))): (Address) HPAT: /(?^mi:((Register)|(9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10) +))/ HIT on 'Address' (Address) in Address (0,0) attempt match on Register ((?^mi:((Register)|(9)|(8)|(7)|(6)|(15)|(14) +|(13)|(12)|(11)|(10)))): (Register) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on 'Register' (Register) in Register (0,1) attempt match on 15 ((?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|( +10)))): (15) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(14)|(13)|(12)|(11)|(10)))/ HIT on '15' (15) in 15 (0,2) attempt match on 14 ((?^mi:((9)|(8)|(7)|(6)|(14)|(13)|(12)|(11)|(10))) +): (14) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(13)|(12)|(11)|(10)))/ HIT on '14' (14) in 14 (0,3) attempt match on 13 ((?^mi:((9)|(8)|(7)|(6)|(13)|(12)|(11)|(10)))): (1 +3) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(12)|(11)|(10)))/ HIT on '13' (13) in 13 (0,4) attempt match on 12 ((?^mi:((9)|(8)|(7)|(6)|(12)|(11)|(10)))): (12) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(11)|(10)))/ HIT on '12' (12) in 12 (0,5) attempt match on 11 ((?^mi:((9)|(8)|(7)|(6)|(11)|(10)))): (11) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(10)))/ HIT on '11' (11) in 11 (0,6) attempt match on 10 ((?^mi:((9)|(8)|(7)|(6)|(10)))): (10) HPAT: /(?^mi:((9)|(8)|(7)|(6)))/ HIT on '10' (10) in 10 (0,7) attempt match on 9 ((?^mi:((9)|(8)|(7)|(6)))): (9) HPAT: /(?^mi:((8)|(7)|(6)))/ HIT on '9' (9) in 9 (0,8) attempt match on 8 ((?^mi:((8)|(7)|(6)))): (8) HPAT: /(?^mi:((7)|(6)))/ HIT on '8' (8) in 8 (0,9) attempt match on 7 ((?^mi:((7)|(6)))): (7) HPAT: /(?^mi:((6)))/ HIT on '7' (7) in 7 (0,10) attempt match on 6 ((?^mi:((6)))): (6) HPAT: /(?^mi:())/ HIT on '6' (6) in 6 (0,11) Captured table (0,1) LEAVE: cdepth: -1, ccount: 1, it: 0 headers = Address Register 15 14 13 12 11 10 9 8 7 6 tcount1 = 1 Table (0,1): Address,Register,15,14,13,12,11,10,9,8,7,6 ---------------------------------------------------------------------- +--------- $

    Now, why do I think this is a bug? Take a look at how the 15 header is being matched in the second invocation of try_match:

    attempt match on 15 ((?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(1 +1)|(10)))): (15) HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))/ HIT on '15' (5) in 15 (1,2) attempt match on 14 ((?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|( +10)))): (14) [...] attempt match on 5 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 attempt match on 4 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0 [...]

    Explanation: the module is attempting to match the header content, 15, against a pattern it constructed. It finds that the content does indeed match, and then proceeds to remove the relevant bit from the pattern. But it removes the wrong part of the pattern: the (5), not the (15). This is why the subsequent match on 5 fails: the part of the pattern it would match against is not there anymore, as 15 was already (wrongly!) matched against it.

    The reason why this happens is that the pattern is sorted by "generality": 5 appears before 15 because "15" =~ m/5/, but not vice versa. What the module should therefore do is remove the rightmost matching pattern (the most specific one), not the leftmost one (the most general one).

    The reason you're only seeing this once you add 5 to the list of headers that you're interested in is that 5 happens the first header in your list that matches against a previous header.

    So it's a bug in someone else's code. Where does that leave you? There's a way around it, fortunately, as HTML::TableExtract accepts not just strings but also regular expressions to match headers against, so if you specify your list of desired headers like this:

    my @headers1 = ( "Address", "Register", qr/^15$/, qr/^14$/, qr/^13$/, qr/^12$/, qr/^11$/, qr/^10$/, qr/^9$/, qr/^8$/, qr/^7$/, qr/^6$/, qr/^5$/, qr/^4$/, qr/^3$/, qr/^2$/, qr/^1$/, qr/^0$/, );

    then you're golden.

      Thanks AppleFritter, I will surely try and use your suggestion .. but the problem here was that i generate the header on the fly based on the input given .. if the input was say 8 .. the header could be 7 6 5 4 3 2 1 0 and if the input was 15 the header could be 15 14 ... 0 can you use your solution for this .. ?

        Of course, you just need to assemble the list of headers the right way. For instance:

        #!/usr/bin/perl use feature qw/say/; use warnings; use strict; my $input = 8; my @headers = reverse map { qr/^$_$/; } (0..($input - 1));

        HTH!