I’m stuck, I’ve been trying to clean up my mistakes and I have a few I just can’t seem to figure out. Maybe I’m too close, but I decided an extra set of eyes may be just what I need.

I have written a script that will monitor the output of a database that is pushed to a website. I’m just scraping the site and then parsing the table. This is where I seem to run into a problem. First the site can take a little while to respond every now and then. This will shut down the script; I may have the solved with the goto statement. But, if there is a better way, I’m always willing to learn. Next either the data I get or the way I parse the table is putting a (? Space, Null, non-printable character…..something It looks like the degree symbol followed by a middle dot °∙ ?) in front of the incident number only. Every now and then another column will have this happen and shutdown the script.. I’ve tried everything I can think of to the point of pulling out my hair. Now I’m just taking shots in the dark hoping something will hit. This will create a problem with naming and saving and recalling the file.

BTW I’m not a developer; I’m just dangerous with a little bit of knowledge of Perl. So that being said I have a lot of gaps in my knowledge but I’m willing to learn.

### Script trimmed down ##### GET: getstore ('http://URL.com/tast.cfm?jump=true&dropboxvalue=0&nblock +=50&Sort_By=INC_Incident.IncidentNumber&Sort_Type=ASC&displayColumnLi +st=1,2,3,4,5,6,7', 'tempincid.html')or goto GET; my $html = 'tempincid.html'; my $te = HTML::TableExtract->new( headers => [("", "Incident # ", "Dispatch Time", "Incident Type", "Address", "Apt. #", "Postal Code", "Unit Dispatched")] ); $te->parse_file($html); $config{'header'} =<<"EOF"; <html> <head> <meta http-equiv="Content-Type" content="text/html"><title>ESSI | Onli +ne Home</title> <meta http-equiv="cache-control" content="no-cache" /> <META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"> <META HTTP-EQUIV="REFRESH" CONTENT="15"> <link href="/Page_style/page.css" rel="stylesheet" type="text/css"></h +ead> <body> EOF my $file = "tempincid.html"; my $date = POSIX::strftime( "%c", localtime( ( stat $file )[9] ) ); my $row = @{$te->rows}; print $config{'header'}; print "<TABLE align=\"center\" border=\"1\" cellpadding=\"2\" cellspac +ing=\"0\" width=\"100%\" style=\"font-size: 12px;\">"; foreach my $ts ($te->tables) { foreach my $row ($ts->rows) { print "<tr><td> ", join(' </td><td> ', @$row), "\n"; ##### +####Every now and then the cell prints a stream of errors. I'm Guessi +ng it is that character I can identify #### print "</td></tr>" , "\n"; } } print "</Table>", "Dispatcher data last read $date", "</body></html>"; my $numColumns = @{$te->rows->[0]}; my $numRows = @{$te->rows}; for my $rowIndex ( 0..$numRows-3 ) { for my $columnIndex ( 0..$numColumns-1 ) { my $cellvalue = $te->rows->[$rowIndex][7]; foreach ($cellvalue) {chomp;} { $cellvalue=uc($cellvalue); if (($cellvalue =~ /BT/) || ($cellvalue =~ /FB/)) { my $path = "C:/incidentnum/"; my $cellmatch = $te->rows->[$rowIndex][$columnIndex]; #********************************************************************* +*********************############################## #---------------Read each cell and as they are opened trim all whitesp +aces from the left and right side-----------------# #------There is still some non-viewable (Null space, non-printable cha +racter to the left of the incident number.--------# ###################################################################### +################################################### my $cell1=$te->rows->[$rowIndex][1]; $cell1 =~ s/^\s+|\s+//g; $cell1 =~ s/^\s+|\s(?=\s)|\s+$//g; my $cell2=$te->rows->[$rowIndex][2]; $cell2 =~ s/^\s+|\s+$//g; my $cell3=$te->rows->[$rowIndex][3]; $cell3 =~ s/^\s+|\s+$//g; my $cell4=$te->rows->[$rowIndex][4]; $cell4 =~ s/^\s+|\s+$//g; my $cell5=$te->rows->[$rowIndex][5]; $cell5 =~ s/^\s+|\s+$//g; my $cell7=$te->rows->[$rowIndex][7]; $cell7 =~ s/^\s+|\s+$//g; my $row="$cell1, $cell2, $cell3, $cell4, $cell5, $cell7"; my $filename = $cell1."."."txt"; $filename =~ s/^\s+|\s+$//g; #$filename =~ s/^S+|\S+$//g; $path =~ s/^\s+|\s+$//g; $path =~ s/^\s+|\s(?=\s)|\s+$//g; my $full ="$path$filename"; open (FILE, '>', $filename) or die("Couldn't open $filename"); print FILE "$row"; close(FILE)or die $!; my $powershell = 'C:\Windows\System32\WindowsPowerShell\v1.0\p +owershell.exe'; my $mboxScript = 'C:\inetpub\cgi-bin\EmailAlert.ps1'; my $result = `$powershell -command "$mboxScript"`; print "$powershell\n"; print "$mboxScript\n"; goto EEND; } } } } EEND: exit 0;

In reply to HTML Parser strange Null Character in data by caind

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.