Re: Regex optimization

If you're sure of the table format...

while       # use a loop to grab all instances
(m|         # use pipes to delimit, so no escaping /
<tr         # beginning of row
.*?         # minimal match of anything
>(\d{5})    # > followed by 5 digits (remember digits)  
.*?         # minimal match of anything
>(\d{2})    # > followed by 2 digits (remember digits)  
.*?         # minimal match of anything
>(\d{3})    # > followed by 3 digits (remember digits)  
.*?         # minimal match of anything
>(\d{3})    # > followed by 3 digits (remember digits)  
.*?         # minimal match of anything
>(\d{2})    # > followed by 2 digits (remember digits)  
.*?         # minimal match of anything
(&nbsp;|\w) # &nbsp; or a letter
</FONT>     # followed by a closing font tag
|isxg) {    # case (i)nsensitive, treat as (s)ingle line,
            # e(x)tended comments, match (g)lobally (all)
    my @row = ($1,$2,$3,$4,$5,$6);
    # now do whatever with @row
}

# condensed

while(m|<tr.*?>(\d{5}).*?>(\d{2}).*?>(\d{3}).*?>(\d{3}).*?>(\d{2}).*?(
+&nbsp;|\w)</FONT>|isg) { 
    my @row = ($1,$2,$3,$4,$5,$6);
}
[download]

not tested, but I think it's OK :)

cLive ;-)

Comment on Re: Regex optimization Download Code

Replies are listed 'Best First'.
Re: Re: Regex optimization by deryni (Beadle) on May 08, 2001 at 11:21 UTC
Thank you for the prompt response. First off, as I said a lot of regexes escapes me, so reminding me that I can use (#) instead of repitition was good. Second, while I did remove the checks for the extra <font> tags I did indeed forget that I needn't check for the <td> tags either. Third, while I'd imagine that this works for the parts involved I do not need either of the nbsp's or the letter that may be in their place, but the data in between them is important. Thank you for all the help. -Etan	[reply]
Re: Regex optimization by cLive ;-) (Prior) on May 08, 2001 at 13:11 UTC
No! # is a comment (the x modifier allows you to do this...) cLive ;-)	[reply]
Re: Re: Regex optimization by deryni (Beadle) on May 08, 2001 at 13:16 UTC
Thank you, I realize that # is a comment. In this case I was using it in it's more common meaning, to signify a number. -Etan	[reply]