$_ = "foo\n Oops";
print /^(?=.{0,50}$).*foo/i ? "Matches!" : $_;
This fixes your regex:
/(?=^.{0,50}\z).*foo/si
\z only matches the end of the string (unlike $ which will match an embedded \n) and /s lets . match everything, including \n. Strictly you don't need the \z in this context and could leave the $ but it is good to know the difference. As you want to match foo it is more efficient to move the ^ .... \z into the lookahead which removes the need for the .* Death to dot star! Oops, updated per jehuni's comment below
This is a really silly way to do it that fulfils your criteria of being a single regex :-)
$str =~ s/(foo)/&do_stuff($str) if length $1 < 50; $1/e;
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] [d/l] [select] |
Even though I don't expect to come across newline characters, your suggestion of using \z and /s is a good one. I definitely only want strings of 50 characters or less -- no matter what characters they are -- to match.
Unfortunately, I don't see a way around using the dreaded .* before the actual pattern I'm looking for. Otherwise it will only match when the pattern occurs at the beginning of the string, since (?=^.{0,50}\z) is a zero-width assertion that's anchored to the start of the string. My original version had the ^ anchor inside the lookahead, like yours, but since it seemed that I had to use .* in either case, I decided to move it outside of the lookahead. However, it's probably clearer to leave it inside.
Also, to clarify my question further, I actually need a matching regex and not a substitution regex. Maybe this is more like golf than I originally realized ...
-jehuni
| [reply] [d/l] |
| [reply] |
Hmmm, that looks for 0-50 characters after the beginning of the string up to the end of the string, then it matches any character as many times as possible up to the last 'foo' in the string. So I guess that'll match your requirements (and testing seems to prove so).
I can't think of a way to nicely match both length *and* a string with one regex (but then again, I'm no regex ninja ;-), so perhaps this would suffice.
&do_stuff($str) if length($str) <= 50 and $str =~ /foo/;
For a far better explanation of the regex check out japhy's superb YAPE::Regex::Explain.
HTH
broquaint | [reply] [d/l] |
but in this case I need one regex and nothing more.
I always question design goals such as these. I distrust question posers who give
such artificial requirements. It's a bit like watching an episode of MacGuyver,
in how the responses come out.
Trouble is, programming isn't MacGuyver. We do have the ability to
include a call to length somewhere, so why the artificial requirement?
Two reasons come to mind:
- The design that forced this decision is bad, in which case we must know more
about the context to help redesign that part of the program, or
- It's Homework, in which case we should not answer the question at all.
I smell homework. Please prove otherwise, by stating the bad design decision more
fully.
-- Randal L. Schwartz, Perl hacker
| [reply] |
Here's the reason: I'm using Data::FormValidator to validate HTML form input. For those not familiar with this module, basically it allows you to pass in a validation profile which contains various "input specifications" that tell it how to validate your data. There is a "constraints" input specification which allows you to specify validation constraints, including coderefs, so no problem using length there (see the example below). However, there is also an input specification "constraint_regexp_map" which allows you to apply a constraint to any fields whose names match a supplied regex (also in the example below). Unfortunately, in this case, you cannot pass it a coderef -- only a regex or the name of a built-in (built into Data::FormValidator, that is) validation function.
Here's an example profile:
my $profile = {
index => {
required => [ qw(firstname surname address1 postcode email) ],
optional => [ qw(middlename address2 address3 address4) ],
constraints => {
postcode => '/^[A-Z]{1,2}\d[A-Z\d]?\s*\d[A-Z]{2}$/i',
email => {
constraint => sub {
return valid_email($_[0]) && length($_[0]) <= 100;
},
params => [ 'email' ],
},
constraint_regexp_map => {
'/name$/' => '/(?=^.{0,25}$)[[:print:]]*$/i',
'/^address/' => '/(?=^.{0,50}$)[[:print:]]*$/i',
},
},
};
So, the answer is probably 1) it's a poor design on the part of Data::FormValidator. I looked at the internals of Data::FormValidator, and I could patch it to accept coderefs, but at the time it was more work than I was willing to do. As in the example above, the "constraints" input specification allows you to supply both a coderef and a list of parameters, which have to be names of form fields. It then calls your coderef with the values of those fields as the parameters. In most cases, you would obviously want to pass it the name of the field to which the constraint applies, although it's not required.
I pondered adding support for backreferences within the names of the form fields, so you could have it match /^address(.*)$/ and then pass it "address$1" as a param. However, due to issues with scoping and eval and etc. and etc., I decided not to mess with a patch for now and just see if I could come up with a single matching regex. Hence the question.
-jehuni
| [reply] [d/l] |
Rather than a callback coderef, you could simply add a "max length" parameter.
That'd be a little more specific, and in line with the other parameters.
So, it was "bad design" rather than homework. Yup. Was equally likely in my book,
hence the question.
-- Randal L. Schwartz, Perl hacker
| [reply] |
<wild speculation>
Although it's not my question, I can imagine
a situation where someone is validating a set of
variables which contain user-supplied data. A simple
design for validating them might involve a hash table
which correlates each data type to an "allowable" regex pattern.
Not that there aren't ways around this (perhaps using subroutines instead of regexes). But I'll admit that
I've done it before, cramming a lot of data validation
into a single regex for this purpose.
</wild speculation>
Update: I'm good. But I'm slow. Ah, well...
buckaduck
| [reply] [d/l] [select] |
Bull's eye!
Anyway, this is my first experience with Data::FormValidator (formerly HTML::FormValidator, I believe). It seems to do what I want, but if anyone has any other recommendations for a generic sort of input validation module, please share. My use of Data::FormValidator is based partly on a Super Search of perlmonks, so I'd love to hear of any other possibilities that I may have overlooked.
-jehuni
| [reply] |
It could just be a badly-worded golf.
#234567890#234567890#234567890
$_=$foo;
$_&&length<=50&&/foo/;
/(?=.{0,50}\z)foo/s;
Take your pick. :-)
------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement. | [reply] [d/l] |