Weird Date::Manip DateParse fail

cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Good day bros. The following snippet:

#!/usr/bin/perl -w
use strict;
use Date::Manip;
use HTML::TreeBuilder;
my $htm = '     <html><div class="posthead">
            <span class="postdate new"><span class="date">14th
            August 2017,&nbsp;<span class=
            "time">21:07</span></span></span> <span class=
            "nodecontrols"><a name="post27949278" href=
            "threads/2360460-product-reviews.htm"
            class="postcounter">#1937</a></span>
          </div></html>';
my $tree = HTML::TreeBuilder->new_from_content($htm);
my $postdate = $tree->look_down('class','date')->as_text();
print "postdate: $postdate\n";
print "postdate parsed: ",ParseDate($postdate),"\n";
my $timestamp = '14th August 2017, 21:07';
print "string parsed: ",ParseDate($timestamp),"\n";
[download]

yields output:

postdate: 14th August 2017, 21:07
postdate parsed: 
string parsed: 2017081421:07:00
[download]

So it fails to parse a date when it's passed to ParseDate as the contents of a variable gotten with HTML::Element, but if I take the exact same text, assign it to a variable as a string literal, and pass it to ParseDate, it parses fine. I've debugged into Date::Manip and it seems to be getting the same string in both cases. Anyone know what's going on here?!?

Comment on Weird Date::Manip DateParse fail Select or Download Code

Replies are listed 'Best First'.
Re: Weird Date::Manip DateParse fail by Corion (Patriarch) on Aug 17, 2017 at 18:37 UTC
Maybe this: `14th August 2017, ` [download] HTML-decodes not to a space after the comma but to `\x{A0}` after the comma, which looks like a plain space but is non-breaking whitespace?	[reply] [d/l] [select]
Re^2: Weird Date::Manip DateParse fail by dbander (Scribe) on Aug 17, 2017 at 19:21 UTC
For kicks, I ran it on a Windows system, and the evidence on the console supports Corion's conclusion (note the `á` where the ` ` would be, which showed as whitespace on the original example): `M:\PerlMonks>perl parsedate.pl postdate: 14th August 2017,á21:07 postdate parsed: string parsed: 2017081421:07:00` [download]	[reply] [d/l] [select]
Re^3: Weird Date::Manip DateParse fail by cormanaz (Deacon) on Aug 17, 2017 at 19:31 UTC
Thanks guys. That fixed it. I guess I should have checked that before posting!	[reply]
Re^4: Weird Date::Manip DateParse fail by ExReg (Priest) on Aug 18, 2017 at 15:33 UTC
Re^2: Weird Date::Manip DateParse fail by snax (Hermit) on Sep 29, 2017 at 02:09 UTC
Thank you for this. I'm trying to capture some table data and the $#160 source elements (which are translated to nbsp elements when inspecting the HTML::Element as_HTML() output) come through in HTML::Element's as_text() method as weird characters, and I couldn't figure out how to clean them with regexes. Now I just `my $el = $_->as_text(); my $nbsp = chr(160); $el =~ s/$nbsp/ /g;` [download] and all is well :)	[reply] [d/l]