Re: Cleaning Regexp

This is untested, but I think this should do the trick:

for($i=0; $i<$glob; $i++){
    $_ = @text[$i];
    if(/href="?([^">]*)\?eid=/is){
    $counter = 1 - $counter;
        print $1 if $counter;
    }
}
[download]

Explanation:

"?: The quote mark is optional
([^">]*): The parens say to snag everything. The [^">]* quantifies 'everything' as being any character that isn't a quote or an end-bracket.
\?eid=: This is the key phrase

I'd include more code, but it's prone to errors. The href="? marks the beginning of your search pattern, and the \?eid= marks the end. If it doesn't have an eid, it should return everything within parens to the special regex string $1.

Update: I added the "is" options to the regex to allow uppercase URLs also. No biggie.

Update 2: Forgot that CODE tags don't need entity substitution, so I had > in one part instead of just >. I fixed it; sorry if I confused anyone.

agermain
"I don't want the world. I just want your half."

Comment on Re: Cleaning Regexp Select or Download Code

Replies are listed 'Best First'.
(bbfu) Re2: Cleaning Regexp by bbfu (Curate) on Aug 01, 2001 at 21:57 UTC
Hrm. You might want to add `\s` and probably `?` (at least, I think amearse doesn't want the query string included...) to your character class, since both of these would also delimit the end of the URL. In fact, you might just be better off using an affirmative class instead of a negative one, since the list of allowable characters is only `[\w/:$-_.+!'(),%@]` (though I've often seen `~` unescaped, and there might be others that are commonly not escaped properly...) (and, again, this is only for the non-query-string part of the URL). Then again, there's a module on CPAN to do all of this (and more) already... bbfu* Seasons don't fear The Reaper. Nor do the wind, the sun, and the rain. We can be like they are.	[reply] [d/l] [select]

Replies are listed 'Best First'.

(bbfu) Re2: Cleaning Regexp
by bbfu (Curate) on Aug 01, 2001 at 21:57 UTC

Hrm. You might want to add \s and probably ? (at least, I think amearse doesn't want the query string included...) to your character class, since both of these would also delimit the end of the URL.

In fact, you might just be better off using an affirmative class instead of a negative one, since the list of allowable characters is only [\w/:$-_.+!*'(),%@] (though I've often seen ~ unescaped, and there might be others that are commonly not escaped properly...) (and, again, this is only for the non-query-string part of the URL).

Then again, there's a module on CPAN to do all of this (and more) already...

bbfu
Seasons don't fear The Reaper.
Nor do the wind, the sun, and the rain.
We can be like they are.

[reply]
[d/l]
[select]