Did you ever happened to have a sudden enlightenment and discover that you, stupid, was struggling to find something deeply wrong, which couldn't be found in any way?
That's what I happened to experience while coding a simple script to rename a bunch of html files containing web-emails, according to name of the sender and date.
I use to check my email on the web, then dump each message as a single html file and delete the whole folder content on the website.
My old script was a mess of regexes, difficult to maintain and adjust to the frequent changes in my webmail service's frequent html structure changes; a question in "Seekers of Perl Wisdom" solved my doubts and made me discover the good HTML::TokeParser and the killing Date::Manip, which turned out to be far the best solution I ever expected to find.
But something didn't work...I mean, there were some specific files which "Date" field seemed to be haunted...the script was correctly parsing html and storing the string containing the Date, but then Date::Manip failed to convert it in the format I like, i.e.:"%y%m%d". I started thinking that the html could be somewhat different between each file, and checked.
nothing.
Then I spent the following 30 minutes reading word by word the Date::Manip manpage, finding that it was correctly used.
But some files were still haunted, and the date seemed to disappear as soon as I tried to convert it with the magic module...
Then, I started inspecting the Date strings, which usually look like this one:
Wed, 28 Jul 2004 9:12 PM ( 14 hours 34 mins ago )
and the regex which gets rid of that non-standard useless reminder between parentheses:
$value=~s/M.*$/M/;
What did I mean? I wanted to replace PM or AM and the following characters with just PM or AM. So I thought it was a good idea to refer just to the M, which was the character in common, and build my killer regex on it.
No way, men, with some files it failed, and the whole date disappeared in a puff of smoke...

until...

...until I discovered that sometimes it's Monday!

useless to say that my regex now is:
$value=~s/[AP]M.*$//;


Edited: i forgot an M in my first regex!

janitored by ybiC: Retitle from one-word "Epiphany" because one-word nodetitles hinder site search

Replies are listed 'Best First'.
Re: Regexp Epiphany
by wfsp (Abbot) on Aug 21, 2004 at 16:26 UTC
    I think Monday's _should_ disappear in a puff of smoke, 'shoot the whole day down'!

    Glad to hear you abandoned the regex approach.

    wfsp

Re: Regexp Epiphany
by bart (Canon) on Aug 21, 2004 at 19:12 UTC
    Aren't you supposed to keep that "AM"/"PM" intact?
    s/(?<=[AP]M).+//;
      oh, it actually doesn't matter.
      I can safely remove AM or PM and it all works ok!

        How can it? You lose information. That strings contains nothing else that would indicate whether the times are AM or PM. Date::Manip is probably assuming 24-hour-time, effectively assuming AM for all times.

        Did you actually test your assumption or are you simply programming by coincidence?

        $ perl -MDate::Manip -le'print UnixDate "Wed, 28 Jul 2004 9:12 PM", "% +s"' 1091041920 $ perl -MDate::Manip -le'print UnixDate "Wed, 28 Jul 2004 9:12", "%s"' 1090998720

        You didn't notice that, because you're not asking for the time, only for the date. Are you sure you will never want to see the time, in a future change to the script? Will you remember this idiosyncracy of your solution at the time? You should at least document this issue in your script; the better approach would of course be to either strip the entire time out of the string completely (Date::Manip will assume 12AM sharp) or not to destroy this information in the first place.

        Makeshifts last the longest.

Re: Regexp Epiphany
by Aristotle (Chancellor) on Aug 22, 2004 at 18:00 UTC

    That explains why your previous regex won't work, but why did you fix it by still matching a location other than what you want to remove?

    $value =~ s/ \(.*\)$//;

    Makeshifts last the longest.

      because I don't mind at all to keep in place that AM or that PM to correctly format my date.