Re: old Perl regex problem
by BrowserUk (Patriarch) on Aug 01, 2002 at 21:02 UTC
|
while(<>) {
chomp and print if /^PH.*$/ and !/[HI]-000$/;
}
| [reply] [d/l] |
Re: old Perl regex problem
by sauoq (Abbot) on Aug 01, 2002 at 21:34 UTC
|
First, I don't think the regular expression you have works at all. BrowserUk's version works nicely.
I don't remember if look-behind was supported in 5.004. If so, you might use something like: /^PH.*(?<![HI]-000)$/
One question: is a file named "PH-000" ok, or not? From your description, I'd think it was but the solutions so far (including the one above) don't allow it. This would: /^PH(.*)/ and $1 !~ /[HI]-000$/;
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] [select] |
|
|
     It works on 5.6.1. Someone tested it for me.
     Using the info that I put in the original post, PH-000 should not be ok. I didn't really think about that too much because I know the format of the actual files that I'm going through, but my original regex would probably be better represented with ".+" after the "PH" instead of ".*?".
Here's an actual file name: PH0022080209401500001PE-000
     BrowserUk's version would work great, but I'm reading the pattern from a configuration file and I can't change anything outside of the slashes.
Invulnerable. Unlimited XP. Unlimited Votes. I must be...
        GhodMode
| [reply] [d/l] |
|
|
I doubt it worked under 5.6.1. Perhaps it was tested, but
then the test was insufficient.
/^PH.*?[^(H\-000)(IF\-000)]$/
will reject any file name ending in a 0, including
PH0022080209401500001PE-000
yet that is a legal file name according to your specifications.
$ /opt/perl/5.6.1/bin/perl -wle 'print "Reject" unless
"PH0022080209401500001PE-000" =~ /^PH.*?[^(H\-000)(IF\-000)]$/'
Reject
$
Abigail | [reply] [d/l] [select] |
Re: old Perl regex problem
by crenz (Priest) on Aug 01, 2002 at 21:07 UTC
|
while (<>) {
chomp;
next if (/(H|IF)-000$/ or !/^PH/);
print "$_\n";
}
Second thought: I just noticed that your description and your regexes differ, so I'm not sure whether you want to match IF-000 or I-000. If you want the latter, use
next if (/[HI]-000$/ or !/^PH/);
But I guess that's obvious. | [reply] [d/l] [select] |
Re: old Perl regex problem
by Abigail-II (Bishop) on Aug 02, 2002 at 11:43 UTC
|
I wonder why none of the people responded to this post actually
read the post carefully. They all come up with solutions that
use two regexes. What's so hard to understand about:
I'm actually reading the pattern from a
configuration file. Everything outside of the slashes is not changeable.
You have a misunderstanding about the meaning of
[ ] inside a regular expression.
[ ] is a character class, and matches exactly
one character. Inside you either list the characters that
are allowed to match, or the characters that aren't allowed
to match. [^(H\-000)(IF\-000)] means the same
as [^(H\-0)IF] and means "match a single character,
the character could be anything, except a (, an H, a dash,
a 0, a ), an I or an F.
If I understand your requirements, you are looking for all
files that start with PH, and do not end with either "H-000" or
"I-000". The following regex ought to work:
/^PH(?:.{0,4}|.*(?![HI]-000).{5})$/
It works with 5.004_02.
Abigail | [reply] [d/l] [select] |
|
|
PH0024080209401400001PH-000
PH0026072913114200001IF-000
PH0029072911352700001AF-000
     I wonder why it worked on 5.6.1? I'll have to double-check that.
Invulnerable. Unlimited XP. Unlimited Votes. I must be...
        GhodMode | [reply] [d/l] [select] |
|
|
/^PH.+[^HI]F?\-000/ will accept a file
called PHfooIF-000/. After all, the
.+ can match the 'fooI', the [^HF]
can match the 'I', and the F? matches nothing.
Abigail
| [reply] [d/l] [select] |
|
|
Close, but that still matches PH-000 due to the .{0,4} clause. I think that
/^PH(?:.{0,3}|.*(?![HI]-000).{5})$/
does it.
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] [select] |
|
|
but that still matches PH-000
That's because I wrote the regex before he revealed that
the things could overlap. However, later he suggests that
the file names are all pretty long, so PH-000 can't happen
anyway.
Abigail
| [reply] |
|
|
     I pasted your regex into my script exactly and it didn't work either :(. I found one that did, though.
     First, I want to make sure I understand yours. Please tell me if the following is correct...
/^PH(?:.{0,4}|.*(?![HI]-000).{5})$/
- starts with PH
- (?:)these parentheses aren't memory parentheses
- followed by 0 to 4 of any character or zero or more of any character
- (?!)return true if "H-000" or "I-000" would not match next
- followed by 5 of any character
- followed by the end of the line
     At first, I thought it failed because of the .{5} part. So, I changed it to {4,5} because the possibilities are "?H-000" (only 4 characters after the H) or "I?-000". That still didn't work. I was confused about why you used the (?:) and the quantifier and the or after the first wildcard dot, so I removed them. It still didn't work.
     After all that I re-started. I counted the characters between the PH and the part I wanted to check (\d{19}) and tried /^PH\d{19}[^I][^H]-000$/. It's a little easier because all of these files have a fixed format. That did work, but I wasn't sure I needed to quantify the characters between the PH and the end of the file name. So, I ended up with /^PH.*[^I][^H]-000$/ which works fine.
     I'm not sure why I couldn't get yours to work. I haven't used regex extensions or assertions before, so I want to understand them better.
Many thanks for your input.
Invulnerable. Unlimited XP. Unlimited Votes. I must be...
        GhodMode
| [reply] [d/l] [select] |
|
|
/^PH.*(?:[^I][^H]|I[^HF])-000$/
Abigail | [reply] [d/l] |