in reply to Re^2: Replacing 3 and 4 digit numbers.
in thread Replacing 3 and 4 digit numbers.

What is the context in which these digit groups appear? With what other four-digit groups that you want to process might they be confused? Can you give some brief example input and desired output? What code have you written so far? What is the "this" that works?


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^4: Replacing 3 and 4 digit numbers.
by htmanning (Friar) on Apr 10, 2016 at 18:24 UTC
    This is a log for a building of apartments. Each apartment/unit number is a 3 or 4 digits. When a log entry is entered, I use the code to link the 3 or 4 digit number to an informational page about the apt. that is generated by resident-info.pl. It works wee, except sometimes people update the log and enter a date in the log itself. When that happens the year is then tagged as a unit number when it shouldn't be. Also, there have been times when someone has included HTML tags in the log entry, and perhaps a URL that includes a number that is not a unit number. The code above then isolates part of that HTML and tries to place a new URL with bold tags around the number in the manually entered HTML. That screws things up. I hope that's clear.

      One approach might be to extend the SKIP-FAIL trick for excluding URLs to also exclude dates:
          $text =~ s{ (?: $url | $date) (*SKIP) (*FAIL) | ($digits_3_4) }{...}xmsg

      Of course, this leaves you with the headache of trying to define a regex to match every possible format of date that a human bean might imagine. Here's a start, but please be aware that this code is untested and also that the  $date regex does not nearly cover every possible permutation of day-month-year ordering or the many internal separator sequences that might be used; you will have to extend this defnition as needed. (Also note that the  $yr pattern is limited to the 21st century.)

      (This regex is 5.8.9 compatible.) The first thing you will want to do is write a Test::More script to test your  $date regex against every possible date format you've ever encountered and any others you can imagine.

      BTW: You have never said if your version of Perl is 5.10 or later, so I don't even know if the SKIP-FAIL trick is possible for you. What version of Perl are you using?


      Give a man a fish:  <%-{-{-{-<