in reply to validate variable-length lines in one regex?

I'm not sure this is of any use but I'll offer it anyway. The idea is to create a mask of codes which is then used to select the correct regex for that column.

#!perl use strict; my %REGEX = ( 'A' => qr'^[A-Z]\d\d$', 'N' => qr'^[0-9]+$', 'N3' => qr'^[0-3]+$', 'D' => qr'^\d+\.\d+$', ); my @p = qw(A N N N3 D D D D D D); while (<DATA>){ chomp; my @f = split '\s+'; my $chk = 'OK '; for my $i (0..$#f){ if ($f[$i] !~ $REGEX{$p[$i]}){ $chk = 'ERR'; $f[$i] = '**'.$f[$i]." $REGEX{$p[$i]}**"; } } print join ' ',$chk,@f,"\n"; } __DATA__ C3 6 3 2.4 1.5 2.6 C32 2 7 3 1.0 H31 1 1 0 21.0 11.2 5.3 1.4 T11 2 1 0 6.0 1.1 2.2 L06 1 1 0 1.0 3.3 L06 1 4 0 1.1 1.8
poj

Replies are listed 'Best First'.
Re^2: validate variable-length lines in one regex?
by uhClem (Scribe) on Jul 06, 2015 at 20:38 UTC

    Oho! Now that is slick, in a gruesome way. I like it; just might use that. Bonus for doing exactly what I want in an almost unrelated way. It even seems like that could be the basis of a script that could figure out for itself what the probable pattern for each lousy file is, and just yank any outliers... Let's see how big a mess I can make with THAT!

    And just the same, there is still that "D D D D D D" -- a non-indeterminate sequence so you just have to hope you don't run into any lines with seven Ds. I bet there's a way around that (and I know I'll never have more than nine -- in this file...) but that does point back to my original question: Can you make a single regex carefully validate a variable number of fields (and return all matches)? Will perl regex do that, or does it exceed the possibilities?

    Anyway, thanks!

      Another thought - If it's possible to edit the file I would put the mask as the first line, no need to edit the script then. Failing that put something in the filename that chooses the correct mask for you. This would of course mean editing the file for each new mask.

      poj

        Hmmm....   Not sure that I would trust the preparer of such files that much.   Nor would I, personally, want to so much as touch the data.   However, you might have some kind of catalog or configuration-file, external to the script, which provides the necessary information.   (And, if the script could not locate exactly-one appropriate entry, for whatever file that it has been given, it would obligingly die().)