comment on

This script is intended to parse the Membership Management page in the mailman administrative interface in order to harvest the name and email address of each subscriber.

Using HTML::TableExtract in text mode, by commenting out the tree importation, gives me access to the the email address of each subscriber. But as shown in the sample at the bottom of the script, I am unable to extract the name from the html form input tag where it exists as the default value for the text box. I have found no documentation for how to extract the raw html so I can parse it myself, but uncommenting the importation on line 4, will give me access to objects which presumably include that data but which so far seem inpenatrable.

Can anyone please advise how I move past stuck on this project?

#!/usr/bin/env perl
use strict;
use warnings;
use HTML::TableExtract; # qw(tree);
use HTML::ElementTable;
use Data::Dumper;
use FindBin;
use File::Util;

# This script is intended to parse the Membership Management page 
# in the mailman administrative interface in order to harvest 
# the name and email address of each subscriber.

my($f) = File::Util->new();
my (@html_files) = $f->list_dir("$FindBin::Bin",'--files-only','--patt
+ern=05\.html');
foreach my $html_file ( @html_files ){
    my $html;
    open( 'HTML', '<', $html_file ) or die "Unable to open $html_file 
+\n";
    while(<HTML>){ $html .= $_; }
    close(HTML);
    parse_subscriber_list( $html );
}

sub parse_subscriber_list {
    my $html = shift;
    my $te = HTML::TableExtract->new(
        headers => [ 'unsub', 'member', 'mod', 'hide', 'nomail', 'ack'
+,
            'not metoo', 'nodupes', 'digest', 'plain', 'language' ] );

    my $row_count;
    $te->parse($html);
    foreach my $ts ($te->tables){
        foreach my $row ($ts->rows){
            $row_count++;
            # chomp( @{$row} );
            print "name:  email: $row->[1] \n";
        }        
    }
}

exit;

__DATA__

<td><a href="http://lists.example.net/options.cgi/updates-example.net/
+hesco--at--example.net">hesco@example.net</a><br><input name="hesco%4
+0example.net_realname" type="TEXT" value="Hugh Esco" size="33"><input
+ name="user" type="HIDDEN" value="hesco%40example.net"></td>
[download]

Please see comment below for final solution.

Thanks,

-- Hugh Esco

if( $lal && $lol ) { $life++; }
if( $insurance->rationing() ) { $people->die(); }
Vote Jill Stein on November 6th!

In reply to Parsing html snippet, help appreciated. by hesco

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.