if ($table->cell($rownum, 0..2) =~ /\xa0/) { s/\xa0\d+/ /; } else { s/\xa0//; }
Will the above substitutions replace the desired pattern with the the desired..er..pattern? What I mean is, do I need three seperate if/else control structures to do the above successfully, or is doing the above the same as using $_ =~ s/.../../;
Is there a more elegant way of doing this?
Update: Heh, thanks GrandFather. Not really sure why I confused loops and control constructs.
Update2: I'll post more code to lessen the confusion. I forgot that most monks cannot read my mind.
#!/usr/bin/perl -w use strict; use HTML::TableExtract; use DBI; my $te = HTML::TableExtract->new(); $te->parse_file("arbitraryname.html"); my $table = $te->first_table_found; my @totalrows; push @totalrows, $_ foreach $table->rows(); my (@title, @teach, @aides); foreach my $rownum (0..$#totalrows) { # A cell can be called by $table->cell(row,column) if ($table->cell($rownum, 0..2) =~ /\xa0/) { s/\xa0\d+/ /; } else { s/\xa0//; } push @title, $table->cell($rownum, 0) ? $table->cell($rownum, 0) : + ''; push @teach, $table->cell($rownum, 1); push @aides, $table->cell($rownum, 2) ? $table->cell($rownum, 2) : + ''; } foreach my $ele (0..$#title) { print "$title[$ele] - $teach[$ele] - $aides[$ele]\n"; # Testing ou +tput before I uncomment database section =for comment # inserting into a database, etc =cut }
Update3: Ok, now for the whole story.
<html><table align="center" border="0" cellpadding="2" cellspacing="0" +><tbody><tr><th align="center" height="24" valign="middle" width="171 +">Boss </th><th colspan="2" align="left" height="24" valign="middle" +width="421">Firstname Surname </th></tr><tr><td align="center" height +="23" valign="middle" width="171">Secretary </td><td colspan="2" alig +n="left" height="23" valign="middle" width="421">Name Surname, Mr Jon +es Smith </td></tr><tr><td align="center" height="23" valign="middle" + width="171">Medical Doctor </td><td colspan="2" align="left" height= +"23" valign="middle" width="421">Bob Middlename Hope </td></tr>< +tr><td align="center" height="23" valign="middle" width="171">Positio +n 1 </td><td align="center" height="23" valign="middle" width="202">W +orker </td><td align="center" height="23" valign="middle" width="219" +>Secretary </td></tr><tr><td height="45" valign="top" width="171"></t +d><td align="left" height="45" valign="top" width="202">Asdf Ghjk </t +d><td align="left" height="45" valign="middle" width="219">Name Lastn +ame, First Last </td></tr><tr><td height="68" valign="top" width="171 +"></td><td align="left" height="68" valign="top" width="202">Sally&nb +sp;Mally </td><td align="left" height="68" valign="top" width="219">J +oe Smoe, The Who, Will Timberland </td></tr><tr><td align="center" he +ight="23" valign="middle" width="171">Position 2 </td><td align="left +" height="23" valign="middle" width="202">Paula Simon </td><td align= +"left" height="23" valign="middle" width="219">Raymonde Maalouf </td> +</tr></html>
The file follows the format, with three columns, one for the title of the position, then a persons name(s), then that person's secretary(ies). I am trying to extract all three elements (all two elements for the first few) and insert them into a database as such:
my $dbh = DBI->connect("DBI:mysql:$dbname:$dburl", "$dbuser", "$db +pass") or die "Could not connect"; my $sth = $dbh->prepare("INSERT INTO $dbtable (position, name, ema +il) VALUES (?, ?, ?)") or die "Could not prepare"; $sth->execute($position, $name, $email) or die "Could not execute" +; $sth->finish(); $dbh->disconnect; }
With the above example, please note that the position will either be Position 1, or Position 1 Secretary, depending on the column, and that the real position name is random. Also note that I can generate their email address easily, and is unrelated to the problem. I just wanted to include that if anything came up.
Oh, and just remembered, the regex is there to strip the 's and replace them with a space if it is before the last name, or nothing if it is at the end of the name (I meant to use /...\w/, not /...\d/). I used \xa0 at the time of writing because that is what I thought I had to strip (today is not my day :\).
I'm so adjective, I verb nouns!
chomp; # nom nom nom
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |