in reply to Recognizing 3 and 4 digit number

G'day htmanning,

Rather than drip-feeding us additional requirement changes, it would be much better if you started with something like this:

#!/usr/bin/env perl -l use strict; use warnings; use Test::More; my @tests = ( ['12', '12'], ['123', '>123<>123<'], ['1234', '>1234<>1234<'], ['12345', '12345'], ['123 4567 890', '>123<>123< >4567<>4567< >890<>890<'], ['123 4567 89', '>123<>123< >4567<>4567< 89'], ['123-4567-890', '123-4567-890'], ['01/02/2017', '01/02/2017'], ['2017-01-02T17:01:34', '2017-01-02T17:01:34'], ["12\n345\n6789\n0", "12\n>345<>345<\n>6789<>6789<\n0"], ); plan tests => scalar @tests; my $re = qr{(?x: (?<![/-]) \b ( [0-9]{3,4} ) \b (?![/-]) )}; for my $test (@tests) { my ($string, $exp) = @$test; (my $got = $string) =~ s/$re/>$1<>$1</g; is($got, $exp, "Testing: $string"); }

All of those tests were successful (output in spoiler):

1..10 ok 1 - Testing: 12 ok 2 - Testing: 123 ok 3 - Testing: 1234 ok 4 - Testing: 12345 ok 5 - Testing: 123 4567 890 ok 6 - Testing: 123 4567 89 ok 7 - Testing: 123-4567-890 ok 8 - Testing: 01/02/2017 ok 9 - Testing: 2017-01-02T17:01:34 ok 10 - Testing: 12 # 345 # 6789 # 0

This helps both you and us. You can add examples of representative input and the wanted output. There's a clear indication of the test data used along with expected and actual results. You can add new tests if necessary; tweak the regex if required; and ensure previous tests still pass. If you run into difficulties, we have all the information we need to provide immediate help. You get a faster, useful response and we don't have the frustration of an ever changing specification.

As I said above, all of those tests were successful. If my test data is fully representative of your data, and my expectations match yours, then you may have a solution. However, if you have other use cases (the more likely scenario), modify the code above, change the regex if need be, and get back to us if you have further problems.

Here's some notes on your code and what I did differently.

Modifiers
You've used a lot of modifiers, most in three places, and most are unnecessary.
  • x: you can specify this once, as I did, with qr{(?x: ... )}. You could have done the same with m & s if they were needed (see the next two points).
  • m: you haven't used any assertions regarding the start/end of line/string - this one is unnecessary. My last test shows this: it has four lines and substitutions occur correctly on lines 2 and 3.
  • s: you haven't used a '.' in the regex; this modifier allows '.' to (also) match newlines - this one is unnecessary.
  • g: this one is fine (although see Source Data below regarding using it twice).
  • See also: "perlre: Modifiers".
Captures
Instead of wrapping your regex in a capture as part of the substitution, add it to the the regex when created, cf. qr{... ( [0-9]{3,4} ) ...) in my code. This would have removed the problem discussed elsewhere in this thread.
Source Data
You probably don't want two lots of substitutions on the same string ($text). In my code, [0-9]{3,4} handles all the use cases; of course, you may have other use cases.

See also: "perlre: Lookaround Assertions" and "perlrecharclass: Bracketed Character Classes".

— Ken