I've spent far too much time (that is to say, almost an hour) trying to understand this, and I'm convinced that this indicates a bug in HTML::TableExtract now. Here's my current testing script:
use HTML::TableExtract;
use feature qw/say/;
use strict;
use warnings;
my $file = 'temp.html';
my @headers1 = qw( Address Register 15 14 13 12 11 10 9 8 7 6 5 );
my @headers2 = qw( Address Register 15 14 13 12 11 10 9 8 7 6 );
sub try_match {
my ($headers_ref) = @_;
my $te = new HTML::TableExtract(
debug => 5,
headers => $headers_ref,
);
$te->parse_file($file);
my @tcount1 = $te->counts(0);
say "headers = @$headers_ref";
say "tcount1 = @tcount1";
foreach my $ts ($te->tables) {
print "Table (", join(',', $ts->coords), "):\n";
foreach my $row ($ts->rows) {
print join(',', @$row), "\n";
}
}
say "-" x 79;
}
try_match(\@headers1);
try_match(\@headers2);
And my current temp.html (based on yours, but formatted nicely and with slightly different data, to rule out colspan attributes as part of the issue):
<table class="gridtable" summary="RegTable">
<tr>
<th>Address</th>
<th>Register</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
<th>Reset</th>
<th>Description</th>
</tr>
<tr>
<td>0x00000001</td>
<td><a href="#RegisterMap:REG0000">REG0000</a></td>
<td align=center colspan=6> TEMP </td>
<td align=center > STOP </td>
<td align=center > START </td>
<td>'h14</td>
<td>TEMPORARY REG.</td>
</tr>
</table>
<table class="gridtable" summary="RegTable">
<tr>
<th>Address</th>
<th>Register</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
<th>Reset</th>
<th>Description</th>
</tr>
<tr>
<td>Address</td>
<td>Register</td>
<td>15</td>
<td>14</td>
<td>13</td>
<td>12</td>
<td>11</td>
<td>10</td>
<td>9</td>
<td>8</td>
<td>7</td>
<td>6</td>
<td>5</td>
<td>4</td>
<td>3</td>
<td>2</td>
<td>1</td>
<td>0</td>
<td>Reset</td>
<td>Description</td>
</tr>
</table>
And here's the script's output:
$ perl 1092870.pl
TE here, headers: Address,Register,15,14,13,12,11,10,9,8,7,6,5
TABLE: cdepth 0, ccount 0, it: 1
Flesh row 0 (11) to 11
111111111111
Flesh row 1 (6) to 11
111000001111
HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|
+(12)|(11)|(10)))/
attempt match on Address ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|
+(5)|(15)|(14)|(13)|(12)|(11)|(10)))): (Address)
HPAT: /(?^mi:((Register)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|
+(10)))/
HIT on 'Address' (Address) in Address (0,0)
attempt match on Register ((?^mi:((Register)|(9)|(8)|(7)|(6)|(5)|(15)|
+(14)|(13)|(12)|(11)|(10)))): (Register)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on 'Register' (Register) in Register (0,1)
attempt match on 7 ((?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11
+)|(10)))): (7)
HPAT: /(?^mi:((9)|(8)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on '7' (7) in 7 (0,2)
attempt match on 6 ((?^mi:((9)|(8)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|(1
+0)))): (6)
HPAT: /(?^mi:((9)|(8)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on '6' (6) in 6 (0,3)
attempt match on 5 ((?^mi:((9)|(8)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))
+): (5)
HPAT: /(?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on '5' (5) in 5 (0,4)
attempt match on 4 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on 3 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on 2 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on 1 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on 0 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on Reset ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))
+): 0
attempt match on Description ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|
+(10)))): 0
Incomplete header match (left: 10, 11, 12, 13, 14, 15, 8, 9) in row 0,
+ resetting scan
HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|
+(12)|(11)|(10)))/
attempt match on 0x00000001 ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(
+6)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on REG0000 ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|
+(5)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on TEMP ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(
+5)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15
+)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15
+)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15
+)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15
+)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15
+)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on STOP ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(
+5)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on START ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|
+(5)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on 'h14 ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)
+|(15)|(14)|(13)|(12)|(11)|(10)))): (14)
HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15)|(13)|(12)|
+(11)|(10)))/
HIT on '14' (14) in 'h14 (1,10)
attempt match on TEMPORARY REG. ((?^mi:((Register)|(Address)|(9)|(8)|(
+7)|(6)|(5)|(15)|(13)|(12)|(11)|(10)))): 0
Incomplete header match (left: 10, 11, 12, 13, 15, 5, 6, 7, 8, 9, Addr
+ess, Register) in row 1, resetting scan
LEAVE: cdepth: -1, ccount: 0, it: 0
TABLE: cdepth 0, ccount 1, it: 1
Flesh row 0 (19) to 19
11111111111111111111
Flesh row 1 (19) to 19
11111111111111111111
HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|
+(12)|(11)|(10)))/
attempt match on Address ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|
+(5)|(15)|(14)|(13)|(12)|(11)|(10)))): (Address)
HPAT: /(?^mi:((Register)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|
+(10)))/
HIT on 'Address' (Address) in Address (0,0)
attempt match on Register ((?^mi:((Register)|(9)|(8)|(7)|(6)|(5)|(15)|
+(14)|(13)|(12)|(11)|(10)))): (Register)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on 'Register' (Register) in Register (0,1)
attempt match on 15 ((?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(1
+1)|(10)))): (15)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on '15' (5) in 15 (0,2)
attempt match on 14 ((?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(
+10)))): (14)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(13)|(12)|(11)|(10)))/
HIT on '14' (14) in 14 (0,3)
attempt match on 13 ((?^mi:((9)|(8)|(7)|(6)|(15)|(13)|(12)|(11)|(10)))
+): (13)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(12)|(11)|(10)))/
HIT on '13' (13) in 13 (0,4)
attempt match on 12 ((?^mi:((9)|(8)|(7)|(6)|(15)|(12)|(11)|(10)))): (1
+2)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(11)|(10)))/
HIT on '12' (12) in 12 (0,5)
attempt match on 11 ((?^mi:((9)|(8)|(7)|(6)|(15)|(11)|(10)))): (11)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(10)))/
HIT on '11' (11) in 11 (0,6)
attempt match on 10 ((?^mi:((9)|(8)|(7)|(6)|(15)|(10)))): (10)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)))/
HIT on '10' (10) in 10 (0,7)
attempt match on 9 ((?^mi:((9)|(8)|(7)|(6)|(15)))): (9)
HPAT: /(?^mi:((8)|(7)|(6)|(15)))/
HIT on '9' (9) in 9 (0,8)
attempt match on 8 ((?^mi:((8)|(7)|(6)|(15)))): (8)
HPAT: /(?^mi:((7)|(6)|(15)))/
HIT on '8' (8) in 8 (0,9)
attempt match on 7 ((?^mi:((7)|(6)|(15)))): (7)
HPAT: /(?^mi:((6)|(15)))/
HIT on '7' (7) in 7 (0,10)
attempt match on 6 ((?^mi:((6)|(15)))): (6)
HPAT: /(?^mi:((15)))/
HIT on '6' (6) in 6 (0,11)
attempt match on 5 ((?^mi:((15)))): 0
attempt match on 4 ((?^mi:((15)))): 0
attempt match on 3 ((?^mi:((15)))): 0
attempt match on 2 ((?^mi:((15)))): 0
attempt match on 1 ((?^mi:((15)))): 0
attempt match on 0 ((?^mi:((15)))): 0
attempt match on Reset ((?^mi:((15)))): 0
attempt match on Description ((?^mi:((15)))): 0
Incomplete header match (left: 15) in row 0, resetting scan
HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|
+(12)|(11)|(10)))/
attempt match on Address ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|
+(5)|(15)|(14)|(13)|(12)|(11)|(10)))): (Address)
HPAT: /(?^mi:((Register)|(9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|
+(10)))/
HIT on 'Address' (Address) in Address (1,0)
attempt match on Register ((?^mi:((Register)|(9)|(8)|(7)|(6)|(5)|(15)|
+(14)|(13)|(12)|(11)|(10)))): (Register)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on 'Register' (Register) in Register (1,1)
attempt match on 15 ((?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(1
+1)|(10)))): (15)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on '15' (5) in 15 (1,2)
attempt match on 14 ((?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(
+10)))): (14)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(13)|(12)|(11)|(10)))/
HIT on '14' (14) in 14 (1,3)
attempt match on 13 ((?^mi:((9)|(8)|(7)|(6)|(15)|(13)|(12)|(11)|(10)))
+): (13)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(12)|(11)|(10)))/
HIT on '13' (13) in 13 (1,4)
attempt match on 12 ((?^mi:((9)|(8)|(7)|(6)|(15)|(12)|(11)|(10)))): (1
+2)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(11)|(10)))/
HIT on '12' (12) in 12 (1,5)
attempt match on 11 ((?^mi:((9)|(8)|(7)|(6)|(15)|(11)|(10)))): (11)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(10)))/
HIT on '11' (11) in 11 (1,6)
attempt match on 10 ((?^mi:((9)|(8)|(7)|(6)|(15)|(10)))): (10)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)))/
HIT on '10' (10) in 10 (1,7)
attempt match on 9 ((?^mi:((9)|(8)|(7)|(6)|(15)))): (9)
HPAT: /(?^mi:((8)|(7)|(6)|(15)))/
HIT on '9' (9) in 9 (1,8)
attempt match on 8 ((?^mi:((8)|(7)|(6)|(15)))): (8)
HPAT: /(?^mi:((7)|(6)|(15)))/
HIT on '8' (8) in 8 (1,9)
attempt match on 7 ((?^mi:((7)|(6)|(15)))): (7)
HPAT: /(?^mi:((6)|(15)))/
HIT on '7' (7) in 7 (1,10)
attempt match on 6 ((?^mi:((6)|(15)))): (6)
HPAT: /(?^mi:((15)))/
HIT on '6' (6) in 6 (1,11)
attempt match on 5 ((?^mi:((15)))): 0
attempt match on 4 ((?^mi:((15)))): 0
attempt match on 3 ((?^mi:((15)))): 0
attempt match on 2 ((?^mi:((15)))): 0
attempt match on 1 ((?^mi:((15)))): 0
attempt match on 0 ((?^mi:((15)))): 0
attempt match on Reset ((?^mi:((15)))): 0
attempt match on Description ((?^mi:((15)))): 0
Incomplete header match (left: 15) in row 1, resetting scan
LEAVE: cdepth: -1, ccount: 1, it: 0
headers = Address Register 15 14 13 12 11 10 9 8 7 6 5
tcount1 =
----------------------------------------------------------------------
+---------
TE here, headers: Address,Register,15,14,13,12,11,10,9,8,7,6
TABLE: cdepth 0, ccount 0, it: 1
Flesh row 0 (11) to 11
111111111111
Flesh row 1 (6) to 11
111000001111
HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)
+|(11)|(10)))/
attempt match on Address ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|
+(15)|(14)|(13)|(12)|(11)|(10)))): (Address)
HPAT: /(?^mi:((Register)|(9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10)
+))/
HIT on 'Address' (Address) in Address (0,0)
attempt match on Register ((?^mi:((Register)|(9)|(8)|(7)|(6)|(15)|(14)
+|(13)|(12)|(11)|(10)))): (Register)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on 'Register' (Register) in Register (0,1)
attempt match on 7 ((?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(1
+0)))): (7)
HPAT: /(?^mi:((9)|(8)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on '7' (7) in 7 (0,2)
attempt match on 6 ((?^mi:((9)|(8)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))
+): (6)
HPAT: /(?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on '6' (6) in 6 (0,3)
attempt match on 5 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on 4 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on 3 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on 2 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on 1 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on 0 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on Reset ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))
+): 0
attempt match on Description ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|
+(10)))): 0
Incomplete header match (left: 10, 11, 12, 13, 14, 15, 8, 9) in row 0,
+ resetting scan
HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)
+|(11)|(10)))/
attempt match on 0x00000001 ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(
+6)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on REG0000 ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|
+(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on TEMP ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(
+15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(1
+4)|(13)|(12)|(11)|(10)))): 0
attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(1
+4)|(13)|(12)|(11)|(10)))): 0
attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(1
+4)|(13)|(12)|(11)|(10)))): 0
attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(1
+4)|(13)|(12)|(11)|(10)))): 0
attempt match on ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(1
+4)|(13)|(12)|(11)|(10)))): 0
attempt match on STOP ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(
+15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on START ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|
+(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on 'h14 ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15
+)|(14)|(13)|(12)|(11)|(10)))): (14)
HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(13)|(12)|(11)
+|(10)))/
HIT on '14' (14) in 'h14 (1,10)
attempt match on TEMPORARY REG. ((?^mi:((Register)|(Address)|(9)|(8)|(
+7)|(6)|(15)|(13)|(12)|(11)|(10)))): 0
Incomplete header match (left: 10, 11, 12, 13, 15, 6, 7, 8, 9, Address
+, Register) in row 1, resetting scan
LEAVE: cdepth: -1, ccount: 0, it: 0
TABLE: cdepth 0, ccount 1, it: 1
Flesh row 0 (19) to 19
11111111111111111111
Flesh row 1 (19) to 19
11111111111111111111
HPAT: /(?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)
+|(11)|(10)))/
attempt match on Address ((?^mi:((Register)|(Address)|(9)|(8)|(7)|(6)|
+(15)|(14)|(13)|(12)|(11)|(10)))): (Address)
HPAT: /(?^mi:((Register)|(9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10)
+))/
HIT on 'Address' (Address) in Address (0,0)
attempt match on Register ((?^mi:((Register)|(9)|(8)|(7)|(6)|(15)|(14)
+|(13)|(12)|(11)|(10)))): (Register)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on 'Register' (Register) in Register (0,1)
attempt match on 15 ((?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(
+10)))): (15)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(14)|(13)|(12)|(11)|(10)))/
HIT on '15' (15) in 15 (0,2)
attempt match on 14 ((?^mi:((9)|(8)|(7)|(6)|(14)|(13)|(12)|(11)|(10)))
+): (14)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(13)|(12)|(11)|(10)))/
HIT on '14' (14) in 14 (0,3)
attempt match on 13 ((?^mi:((9)|(8)|(7)|(6)|(13)|(12)|(11)|(10)))): (1
+3)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(12)|(11)|(10)))/
HIT on '13' (13) in 13 (0,4)
attempt match on 12 ((?^mi:((9)|(8)|(7)|(6)|(12)|(11)|(10)))): (12)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(11)|(10)))/
HIT on '12' (12) in 12 (0,5)
attempt match on 11 ((?^mi:((9)|(8)|(7)|(6)|(11)|(10)))): (11)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(10)))/
HIT on '11' (11) in 11 (0,6)
attempt match on 10 ((?^mi:((9)|(8)|(7)|(6)|(10)))): (10)
HPAT: /(?^mi:((9)|(8)|(7)|(6)))/
HIT on '10' (10) in 10 (0,7)
attempt match on 9 ((?^mi:((9)|(8)|(7)|(6)))): (9)
HPAT: /(?^mi:((8)|(7)|(6)))/
HIT on '9' (9) in 9 (0,8)
attempt match on 8 ((?^mi:((8)|(7)|(6)))): (8)
HPAT: /(?^mi:((7)|(6)))/
HIT on '8' (8) in 8 (0,9)
attempt match on 7 ((?^mi:((7)|(6)))): (7)
HPAT: /(?^mi:((6)))/
HIT on '7' (7) in 7 (0,10)
attempt match on 6 ((?^mi:((6)))): (6)
HPAT: /(?^mi:())/
HIT on '6' (6) in 6 (0,11)
Captured table (0,1)
LEAVE: cdepth: -1, ccount: 1, it: 0
headers = Address Register 15 14 13 12 11 10 9 8 7 6
tcount1 = 1
Table (0,1):
Address,Register,15,14,13,12,11,10,9,8,7,6
----------------------------------------------------------------------
+---------
$
Now, why do I think this is a bug? Take a look at how the 15 header is being matched in the second invocation of try_match:
attempt match on 15 ((?^mi:((9)|(8)|(7)|(6)|(5)|(15)|(14)|(13)|(12)|(1
+1)|(10)))): (15)
HPAT: /(?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(10)))/
HIT on '15' (5) in 15 (1,2)
attempt match on 14 ((?^mi:((9)|(8)|(7)|(6)|(15)|(14)|(13)|(12)|(11)|(
+10)))): (14)
[...]
attempt match on 5 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
attempt match on 4 ((?^mi:((9)|(8)|(15)|(14)|(13)|(12)|(11)|(10)))): 0
[...]
Explanation: the module is attempting to match the header content, 15, against a pattern it constructed. It finds that the content does indeed match, and then proceeds to remove the relevant bit from the pattern. But it removes the wrong part of the pattern: the (5), not the (15). This is why the subsequent match on 5 fails: the part of the pattern it would match against is not there anymore, as 15 was already (wrongly!) matched against it.
The reason why this happens is that the pattern is sorted by "generality": 5 appears before 15 because "15" =~ m/5/, but not vice versa. What the module should therefore do is remove the rightmost matching pattern (the most specific one), not the leftmost one (the most general one).
The reason you're only seeing this once you add 5 to the list of headers that you're interested in is that 5 happens the first header in your list that matches against a previous header.
So it's a bug in someone else's code. Where does that leave you? There's a way around it, fortunately, as HTML::TableExtract accepts not just strings but also regular expressions to match headers against, so if you specify your list of desired headers like this:
my @headers1 = (
"Address",
"Register",
qr/^15$/,
qr/^14$/,
qr/^13$/,
qr/^12$/,
qr/^11$/,
qr/^10$/,
qr/^9$/,
qr/^8$/,
qr/^7$/,
qr/^6$/,
qr/^5$/,
qr/^4$/,
qr/^3$/,
qr/^2$/,
qr/^1$/,
qr/^0$/,
);
then you're golden.
|