Umdurman has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys, I need a regex that can check if a string contains exactly two 'x' and one '/', and some numbers, like in this example '240 x 240 x 2/3600'. I strip all spaces and stuff out of the string upfront. The numbers can be anything from 1 up to 10000. Been trying, but up to now, I haven't found a solution. Anyone? KInd regards, Ton

Replies are listed 'Best First'.
Re: Regex question
by BillKSmith (Monsignor) on Sep 17, 2023 at 04:16 UTC
    If numbers, operators, and whitespace are as shown:
    use strict; use warnings; use Test::More tests => 1; my $sample_string = '240 x 240 x 2/3600'; my $number = qr/1?\d{1,4}/; my $valid = $sample_string =~ m{$number x $number x $number/$number}; ok($valid, 'Sample string');

    UPDATE: Add revised code incorporating comments below.

    use strict; use warnings; use Test::More tests => 6; my @valid_strings = ( ['240 x 240 x 2/3600', 'Sample'], ['10000 x 240 x 2/3600', 'Max number'], ['1 x 240 x 2/3600', 'Min number'], ); #my $number = qr/1?\d{1,4}/; my $number = qr/10000|[1-9]\d{0,3}/; my $valid = qr {\A$number x $number x $number/$number\z}; foreach my $case (@valid_strings) { like $case->[0], $valid, $case->[1] } my @invalid_strings = ( ['240 * 240 x 2/3600', 'Missing operator'], ['10001 x 240 x 2/3600', 'number exceeds max'], ['0 x 240 x 2/3600', 'number less than minimum'], ); foreach my $case (@invalid_strings) { unlike $case->[0], $valid, $case->[1] }

    OUTPUT:

    1..6 ok 1 - Sample ok 2 - Max number ok 3 - Min number ok 4 - Missing operator ok 5 - number exceeds max ok 6 - number less than minimum
    Bill

      G'day Bill,

      I'd say you're on the right track creating a regex for $number and building up a more complex regex from there. Unfortunately, you're matching some things that you shouldn't.

      $ perl -E ' say "$_: ", /1?\d{1,4}/ ? "Y" : "N" for qw{240 2 3600 1 10000 0 0000 19999 999999999}; ' 240: Y 2: Y 3600: Y 1: Y 10000: Y 0: Y 0000: Y 19999: Y 999999999: Y

      I'd aim for a more stringent regex for $number.

      $ perl -E ' say "$_: ", /(?<![0-9])(?:10000|[1-9][0-9]{0,3})(?![0-9])/ ? "Y" : + "N" for qw{240 2 3600 1 10000 0 0000 19999 999999999}; ' 240: Y 2: Y 3600: Y 1: Y 10000: Y 0: N 0000: N 19999: N 999999999: N

      The OP is somewhat unclear in that it shows an example with spaces then says spaces are removed. Stuff is also removed, whatever that refers to. There's not much we can do about that beyond requesting clarification.

      I'd also add ^ and $ (or equivalent) assertions to the final regex.

      — Ken

        Well, I would use \b,
        say "$_: ", /\b(10000|[1-9][0-9]{0,3})\b/ ? "Y" : "N" for qw{240 2 3600 1 10000 0 0000 19999 999999999};
        Same as your result:
        240: Y 2: Y 3600: Y 1: Y 10000: Y 0: N 0000: N 19999: N 999999999: N
        updated: made capturing group
      I would add \b boundaries at the front and rear else for example, 2/365555555 is a valid number. m{\b$number x $number x $number/$number\b} Update: alternatively using {\A$number x $number x $number/$number\z} or {^$number x $number x $number/$number$} also looks fine to me.

      In addition, for even more potential validation of this string, the number itself could be made capturing or put parens in the longer regex.

      if ( my($n1,$n2,$n3,$n4) = string =~ {\A($number) x ($number) x ($numb +er)/($number)\z} and $n1 == $n2 ) { valid format and square}

      What does this mean: 'I strip all spaces and stuff out of the string upfront.' ???

      Make whitesspace optional with \s* and add a few more tests:

      my @valid_strings = ( ['240 x 240 x 2/3600', 'Sample'], ['10000 x 240 x 2/3600', 'Max number'], ['1 x 240 x 2/3600', 'Min number'], ['120x240x2/3600', 'Number sans whitespace'], ['120x240x2 / 3600', 'Fraction with whitespace'], ); my $number = qr/10000|[1-9]\d{0,3}/; my $valid = qr{\b$number\s*x\s*$number\s*x\s*$number\s*/\s*$number\b} +; #my $valid = qr{\A$number\s*x\s*$number\s*x\s*$number\s*/\s*$number\z +}; #my $valid = qr{^$number\s*x\s*$number\s*x\s*$number\s*/\s*$number$}; my @invalid_strings = ( ['240 * 240 x 2/3600', 'Missing operator'], ['10001 x 240 x 2/3600', 'number exceeds max'], ['100 x 10001 x 2/3600', 'number exceeds max'], ['100 x 10001 x 2/10001', 'number exceeds max'], ['240x240x10001/3600', 'number exceeds max'], ['0 x 240 x 2/3600', 'number less than minimum'], ['240 x 240 x 2/0', 'Cannot divide by zero'], ['240 x 240 x 0/3600', 'Numerator cannot be zero'], );
Re: Regex question
by Umdurman (Acolyte) on Sep 17, 2023 at 09:57 UTC
    Hey guys, thank you so far. I see a lot of complex solutions and was hoping for a more simple and elegant regex. The 'stuff' I stril is all but numbers, X and /. So, I always have a clean string like '240x240x2/3600'. Later I convert this string to '240*240*2/3600'. A regex to check this string, '240*240*2/3600' would be preferable. I think i am there. I am testing this: =~ m/^\d{1,4}\*\d{1,4}\*\d{1,4}\/\d{1,4}$/) { Thank you all in advance. Ton
      If you truly believed that you always had a 'clean string', you would not need a regex test at all. Your latest regex would match any valid string and much more (in fact, nearly every "invalid_string" in my UPDATE). If you post a list of valid string which demonstrate the variation that is allowed and a list of invalid strings which demonstrate the types of errors you want to detect, we probably can make a simpler regex which does the job. You can modify my UPDATE to test a possible regex against all those strings.

      Please note that every regex suggestion you have received is only one or two lines long. The best solution probably will be a better fit to your problem, but not much shorter.

      Bill

      This looks like it works ..

      tab@music4:~/Pianoforte/Development/Perlmonks 12:19:50 $ cat 11154494. +pl #!/usr/bin/perl use strict; use warnings; my $str = '240*240*2/3600'; use Test::More; { like ( $str, qr{^(\d+[*/])+(\d+)$}, 'Check regex' ); done_testing; } tab@music4:~/Pianoforte/Development/Perlmonks 12:19:55 $ prove !$ prove 11154494.pl 11154494.pl .. ok All tests successful. Files=1, Tests=1, 0 wallclock secs ( 0.02 usr 0.01 sys + 0.06 cusr + 0.00 csys = 0.09 CPU) Result: PASS tab@music4:~/Pianoforte/Development/Perlmonks 12:19:58 $
      This assumes that you've stripped out the spaces.

      The regex I've used is just 'number plus operator' repeated at least once, followed by 'number'. So this should work for unsigned integers. You'd need to expand the regex for signed numbers, fractional numbers, numbers with exponents, numbers in a base higher than ten, and so forth.

      Update: And if you want to look for a more specific pattern (as in your original post, sorry!)

      tab@music4:~/Pianoforte/Development/Perlmonks 12:33:45 $ cat !$ cat 11154494-2.pl #!/usr/bin/perl use strict; use warnings; my $str = '240*240*2/3600'; use Test::More; { like ( $str, qr{^(\d+\*){2}(\d+)/(\d+)$}, 'Check regex' ); done_testing; } tab@music4:~/Pianoforte/Development/Perlmonks 12:33:50 $ prove !$ prove 11154494-2.pl 11154494-2.pl .. ok All tests successful. Files=1, Tests=1, 0 wallclock secs ( 0.02 usr 0.01 sys + 0.05 cusr + 0.00 csys = 0.08 CPU) Result: PASS tab@music4:~/Pianoforte/Development/Perlmonks 12:33:58 $

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.