if ($table->cell($rownum, 0..2) =~ /\xa0/) { s/\xa0\d+/ /; } else { s/\xa0//; }

Will the above substitutions replace the desired pattern with the the desired..er..pattern? What I mean is, do I need three seperate if/else control structures to do the above successfully, or is doing the above the same as using $_ =~ s/.../../;

Is there a more elegant way of doing this?

Update: Heh, thanks GrandFather. Not really sure why I confused loops and control constructs.

Update2: I'll post more code to lessen the confusion. I forgot that most monks cannot read my mind.

#!/usr/bin/perl -w use strict; use HTML::TableExtract; use DBI; my $te = HTML::TableExtract->new(); $te->parse_file("arbitraryname.html"); my $table = $te->first_table_found; my @totalrows; push @totalrows, $_ foreach $table->rows(); my (@title, @teach, @aides); foreach my $rownum (0..$#totalrows) { # A cell can be called by $table->cell(row,column) if ($table->cell($rownum, 0..2) =~ /\xa0/) { s/\xa0\d+/ /; } else { s/\xa0//; } push @title, $table->cell($rownum, 0) ? $table->cell($rownum, 0) : + ''; push @teach, $table->cell($rownum, 1); push @aides, $table->cell($rownum, 2) ? $table->cell($rownum, 2) : + ''; } foreach my $ele (0..$#title) { print "$title[$ele] - $teach[$ele] - $aides[$ele]\n"; # Testing ou +tput before I uncomment database section =for comment # inserting into a database, etc =cut }

Update3: Ok, now for the whole story.

I am parsing through an html table converted from a pdf file that listed names and job titles. What makes this annoying is that each table cell contains more than one name, and not every row has a job title. Example:

<html><table align="center" border="0" cellpadding="2" cellspacing="0" +><tbody><tr><th align="center" height="24" valign="middle" width="171 +">Boss </th><th colspan="2" align="left" height="24" valign="middle" +width="421">Firstname Surname </th></tr><tr><td align="center" height +="23" valign="middle" width="171">Secretary </td><td colspan="2" alig +n="left" height="23" valign="middle" width="421">Name Surname, Mr Jon +es Smith </td></tr><tr><td align="center" height="23" valign="middle" + width="171">Medical Doctor </td><td colspan="2" align="left" height= +"23" valign="middle" width="421">Bob&nbsp;Middlename Hope </td></tr>< +tr><td align="center" height="23" valign="middle" width="171">Positio +n 1 </td><td align="center" height="23" valign="middle" width="202">W +orker </td><td align="center" height="23" valign="middle" width="219" +>Secretary </td></tr><tr><td height="45" valign="top" width="171"></t +d><td align="left" height="45" valign="top" width="202">Asdf Ghjk </t +d><td align="left" height="45" valign="middle" width="219">Name Lastn +ame, First Last </td></tr><tr><td height="68" valign="top" width="171 +"></td><td align="left" height="68" valign="top" width="202">Sally&nb +sp;Mally </td><td align="left" height="68" valign="top" width="219">J +oe Smoe, The Who, Will Timberland </td></tr><tr><td align="center" he +ight="23" valign="middle" width="171">Position 2 </td><td align="left +" height="23" valign="middle" width="202">Paula Simon </td><td align= +"left" height="23" valign="middle" width="219">Raymonde Maalouf </td> +</tr></html>

The file follows the format, with three columns, one for the title of the position, then a persons name(s), then that person's secretary(ies). I am trying to extract all three elements (all two elements for the first few) and insert them into a database as such:

my $dbh = DBI->connect("DBI:mysql:$dbname:$dburl", "$dbuser", "$db +pass") or die "Could not connect"; my $sth = $dbh->prepare("INSERT INTO $dbtable (position, name, ema +il) VALUES (?, ?, ?)") or die "Could not prepare"; $sth->execute($position, $name, $email) or die "Could not execute" +; $sth->finish(); $dbh->disconnect; }

With the above example, please note that the position will either be Position 1, or Position 1 Secretary, depending on the column, and that the real position name is random. Also note that I can generate their email address easily, and is unrelated to the problem. I just wanted to include that if anything came up.

Oh, and just remembered, the regex is there to strip the &nbsp;'s and replace them with a space if it is before the last name, or nothing if it is at the end of the name (I meant to use /...\w/, not /...\d/). I used \xa0 at the time of writing because that is what I thought I had to strip (today is not my day :\).

I'm so adjective, I verb nouns!

chomp; # nom nom nom


In reply to Will a substitution in an if/else control structure default to $_? by Lawliet

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.