Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

regex grouping issue

by nmerriweather (Friar)
on Aug 31, 2006 at 19:11 UTC ( [id://570665]=perlquestion: print w/replies, xml ) Need Help??

nmerriweather has asked for the wisdom of the Perl Monks concerning the following question:

sadly, i need to perform some regex operations on a myspace page
i'm running into a wall with tuning a regex to parse a user id link
This works :
my $RE_myspace_user= qr/myspace\.com\/([\d\w]*)(:?\/|$)/;
However, that could match some unwanted things, so I updated it... This works (limit out their index and browse page ):
my $RE_myspace_user= qr/myspace\.com\/[^(?:index\.cfm|browse)([\d\w]*) +](:?\/|$)/;
But thats' messy - sure they only use index.cfm, but something would be better to block all .cfm pages, so... This doesn't work:
my $RE_myspace_user= qr/myspace\.com\/[^(?:[\w\d\_]\.cfm|browse)([\d\w +]*)](:?\/|$)/;
it just generates an erro i guess i can't put a character class in that nesting, which is fine -- does anyone have a suggestion to approximate it though?

Replies are listed 'Best First'.
Re: regex grouping issue
by Fletch (Bishop) on Aug 31, 2006 at 19:14 UTC

    Erm, [^(?:index\.cfm|browse)([\d\w]*)] doesn't mean what you think it means. You're conflating a complemented character class with negative lookahead; look in perlre for (?!).

      i totally forgot about negative lookaheads...

      that should cut down on a lot of my non-capturing parenthesis.

      thanks for the tip!
        The primary problem is that a character class is for characters, not sequences of characters. You can't use a character class to match any of a group of strings, and you can't use a negated character class to NOT match any of a group of strings.

        Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
        How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: regex grouping issue
by Velaki (Chaplain) on Sep 01, 2006 at 10:40 UTC

    Something you might wish to consider is the restructuring of the logic to take advantage of DeMorgan's Laws, which I found myself using the other day on a particularly funky regex at work.

    Essentially, DeMorgan's Laws in my own words are

    DeMorgan's Laws:
    The compliment of a conjunction is the disjunction of the compliments.
    and
    The compliment of a disjunction is the conjunction of the compliments.

    In other words, not(P and Q) = (not-P or not-Q), et vice versa.

    In my application, rather than using alternate lists of certain characters with positive lookarounds, I looked for the match text to have none of the characters with the negative lookarounds. (Which is an "AND" condition, and much easier to perform in my case.)

    Hope this helped a little,
    -v.

    "Perl. There is no substitute."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://570665]
Approved by chargrill
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-19 22:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found