Yes, I'm still writing my book. And there's going to be a cookbook of sorts at the end, a compendium of useful regexes for all sorts of occasions. (I'm afraid I will be including some tag-parsing ones, but I'll make it clear that HTML and XML should be parsed by modules, etc.)
So I ask you, my fellow amonkicans, to help me. What regexes have you found yourselves using? Not simple dinky ones, but perhaps regexes that got you out of a bind, or were quite sneaky at what they did, or you find yourself using a lot. I'd much appreciate your input, and the proper acknowledgements will be made in my book. Thank you.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
Re: Regular Expressions: Call for Examples
by jryan (Vicar) on Jul 21, 2002 at 19:08 UTC
|
Well, I went truding around and I found these:
This first one was from a client who accidentally did something like: s/\n//g; to a few hundred text files. The files happened to be lists:
1) foo and his friend bar
2) Stuff and more (stuff)
3) More (and30) more (and) more
4) garbage (8)
5) some other things
6) (7lalala)
which turned into:
1) foo and his friend bar2) Stuff and more (stuff)3) More (and30) more
+ (and) more4) garbage (8)5) some other things6) (7lalala)
Anyways, it was my job to fix them. I was having a difficult time correctly parsing, but ended up solving the problem using your sexeger technique.
$text = reverse $copy;
$text =~
s/
(?<= \) )
(\d+)
(?= [^()]* \) )
/$1\n/gx;
$text = reverse $text;
A few weeks later, for fun, I was able to solve the problem using a forward regex:
my $bal = # this is from perlre
qr/
\(
(?:
(?> [^()]+ )
|
(??{$bal})
)*
\)
/x;
$text =~
s/
(
(?:
(??{$bal})
[^(\d]*
)*
)
(\d+)
(?= \) )
/$1\n$2/xg;
Which is exponentially uglier, and proves just how useful sexeger really is.
Another possibly useful example is a dealing from irc, where the person needed to perform a crude form escaping that involved stripping all backslashes that were between brackets. This solved his problem:
$text =~
s/
(?<= \[ )
([^\]]*)
(?= \] )
/ strip_slash($1) /gex;
sub strip_slash { $_=pop; s/\\//g; $_; }
Finally, theres a bunch of stuff at the end of Parsing with Perl 6 you might find useful...
| [reply] [d/l] [select] |
|
Since I'm not yet the regex master I aspire to be, I can't authoritatively state that this solution is better, but it seems to work.
If you're working with ordered item labels you can make your assertion more specific:
$n = 1;
s/((??{$n+1})\))(?{$n++})/\n$1/g;
The first iteration matches "2)" and replaces it with "\n2)", the second "3)", and so on.
conv
Update: I should know better than to post when I'm tired. Someone just pointed out to me that it would be much neater to do:
$n = 2; ++$n while s/$n(?=\))/\n$n/
Thanks, Aristotle, you're right. The while loop substitution isn't equivalent because it will make replacements in any order (at any position in the string) while the original substitution I posted will not.
| [reply] [d/l] [select] |
|
Actually, they are not interchangeable: the latter loses the "ordered items" assumption. Observe what they do with
2) bar 3) asfgh 7) lorem 6) ipsum 1) foo 5) baz 4) blah
I tried fixing that using \G, but didn't come up with anything useful in 5 minutes and gave up since it would have been a lot more complicated than your first regex which I believe is just perfect.
japhy: I like the scenario presented here. This is a regex (series) I'd propose you pick up; it's simple in premise and not far from something one might actually have to do one day, and it's not hard even for a novice to follow along on the subleties in the differences of each approach. A perfect teaching example, if you ask me.
Makeshifts last the longest.
| [reply] [d/l] |
(jeffa) Re: Regular Expressions: Call for Examples
by jeffa (Bishop) on Jul 21, 2002 at 22:09 UTC
|
Here are 3 regexes i used recently for
Node
Link Checker. The problem is to turn PM link settings
into their respective HTML links. First, the lookup
table:
my %TAG = (
ftp => 'ftp://',
http => 'http://',
https => 'https://',
kobe => 'http://theoryx5.uwinnipeg.ca/mod_perl/cpan-search?filet
+ype=+distribution+name+or+description&j&case=clike&search=',
kobes => 'http://theoryx5.uwinnipeg.ca/mod_perl/cpan-search?filet
+ype=+distribution+name+or+description&j&case=clike&search=',
cpan => 'http://search.cpan.org/search?mode=module&query=',
isbn => 'http://shop.barnesandnoble.com/booksearch/isbnInquiry.a
+sp?isbn=', google => 'http://www.google.com/search?q=',
lucky => 'http://www.google.com/search?btnI=I&q=',
jargon => 'http://www.science.uva.nl/cng/search/htsearch.CGI?restr
+ict=%2F%7Emes%2F&jargon%2Fwords=',
id => '/index.pl?node_id=',
pad => '/index.pl?node_id=108949&user=',
DEFAULT => '/index.pl?node=',
);
Next, the regexes:
# takes care of [tag://target|alt]
$chunk =~ s/\[(\w+):\/\/(.*?)\|([^\]]+)\]/<a href="$TAG{$1}$2">$3<\/a>
+/g;
# takes care of [tag://target]
$chunk =~ s/\[(\w+):\/\/([^\]]+)\]/<a href="$TAG{$1}$2">$2<\/a>/g;
# takes care of [target]
$chunk =~ s/\[([^\]]+)\]/<a href="$TAG{DEFAULT}$1">$1<\/a>/g;
The 3 regexes have to be executed in that order.
Maybe they could be combined into one regex, but this
worked for me. :)
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] [select] |
|
$chunk =~
s!
\[
(?(?= [^:]*://)
(?:
(\w+)://
([^|\]]+)
(?:
\| ([^\[]+)
)?
)
|
(\w+)
)
\]
!
$_ = "<a href=\"".((defined$1)?(qq($TAG{$1}$2">).((defined$3)?$3:$2)):
qq($TAG{DEFAULT}$4">$4))."</a>"!gex;
I truely think that jeffa's approach is better than the above. It's much smarter to break a 3 case problem into 3 steps rather than use 1 gigantic regex. Just look at the "substitution" section; its hidious (I'd normally have used a sub to handle the above "substitution" section, but then wouldn't be "one regex" :)
. | [reply] [d/l] |
Re: Regular Expressions: Call for Examples
by Abigail-II (Bishop) on Jul 22, 2002 at 12:05 UTC
|
perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/'
And then there's my URL matcher. A bit outdated, as it only
matches HTTP, FTP, News, NNTP, telnet, gopher, WAIS, mailto,
file, prospero, LDAP, z39.50, CID, MID, VEMMI, IMAP and NFS
URLs. Many other URLs schemes have seen the light the last
5 years. One of these days, I'll update the regex....
Here it is, just remove the newlines....
Abigail
(?:http://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.
)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)
){3}))(?::(?:\d+))?)(?:/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F
\d]{2}))|[;:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{
2}))|[;:@&=])*))*)(?:\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{
2}))|[;:@&=])*))?)?)|(?:ftp://(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?
:%[a-fA-F\d]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-
fA-F\d]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-
)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?
:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))(?:/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!
*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'()
,]|(?:%[a-fA-F\d]{2}))|[?:@&=])*))*)(?:;type=[AIDaid])?)?)|(?:news:(?:
(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;/?:&=])+@(?:(?:(
?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[
a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3})))|(?:[a-zA-Z](
?:[a-zA-Z\d]|[_.+-])*)|\*))|(?:nntp://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[
a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d
])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)/(?:[a-zA-Z](?:[a-zA-Z
\d]|[_.+-])*)(?:/(?:\d+))?)|(?:telnet://(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+
!*'(),]|(?:%[a-fA-F\d]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Z\d$\-_.+!*'()
,]|(?:%[a-fA-F\d]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a
-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d]
)?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))/?)|(?:gopher://(?:(?:
(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:
(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+
))?)(?:/(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))(?:(?:(?:[
a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))*)(?:%09(?:(?:(?:[a-zA
-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*)(?:%09(?:(?:[a-zA-Z\d$
\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))*))?)?)?)?)|(?:wais://(?:(?:(?:
(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:
[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?
)/(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)(?:(?:/(?:(?:[a-zA
-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)/(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(
?:%[a-fA-F\d]{2}))*))|\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]
{2}))|[;:@&=])*))?)|(?:mailto:(?:(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%
[a-fA-F\d]{2}))+))|(?:file://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]
|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:
(?:\d+)(?:\.(?:\d+)){3}))|localhost)?/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'()
,]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(
?:%[a-fA-F\d]{2}))|[?:@&=])*))*))|(?:prospero://(?:(?:(?:(?:(?:[a-zA-Z
\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)
*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)/(?:(?:(?:(?
:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-
zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*))*)(?:(?:;(?:(?:(?:[
a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&])*)=(?:(?:(?:[a-zA-Z\d
$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&])*)))*)|(?:ldap://(?:(?:(?:(?:
(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:
[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?
))?/(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d])
)|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%2
0)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F
\d]{2}))*))(?:(?:(?:%0[Aa])?(?:%20)*)\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?
:(?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID
|oid)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])
?(?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)))*)(?:(
?:(?:(?:%0[Aa])?(?:%20)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))(?:(?:(?:(?:(
?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|o
id)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(
?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*))(?:(?:(?:
%0[Aa])?(?:%20)*)\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?:(?:(?:[a-zA-Z\d]|%(
?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?:
\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a
-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)))*))*(?:(?:(?:%0[Aa])?(?:%2
0)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))?)(?:\?(?:(?:(?:(?:[a-zA-Z\d$\-_.+
!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:,(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-f
A-F\d]{2}))+))*)?)(?:\?(?:base|one|sub)(?:\?(?:((?:[a-zA-Z\d$\-_.+!*'(
),;/?:@&=]|(?:%[a-fA-F\d]{2}))+)))?)?)?)|(?:(?:z39\.50[rs])://(?:(?:(?
:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?
:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))
?)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:\+(?:(?:
[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))*(?:\?(?:(?:[a-zA-Z\d$\-_
.+!*'(),]|(?:%[a-fA-F\d]{2}))+))?)?(?:;esn=(?:(?:[a-zA-Z\d$\-_.+!*'(),
]|(?:%[a-fA-F\d]{2}))+))?(?:;rs=(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA
-F\d]{2}))+)(?:\+(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))*)
?))|(?:cid:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@&=
])*))|(?:mid:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@
&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@&=]
)*))?)|(?:vemmi://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z
\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\
.(?:\d+)){3}))(?::(?:\d+))?)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a
-fA-F\d]{2}))|[/?:@&=])*)(?:(?:;(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a
-fA-F\d]{2}))|[/?:@&])*)=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d
]{2}))|[/?:@&])*))*))?)|(?:imap://(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+
!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+)(?:(?:;[Aa][Uu][Tt][Hh]=(?:\*|(?:(
?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+))))?)|(?:(?:;[
Aa][Uu][Tt][Hh]=(?:\*|(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2
}))|[&=~])+)))(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[
&=~])+))?))@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])
?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:
\d+)){3}))(?::(?:\d+))?))/(?:(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:
%[a-fA-F\d]{2}))|[&=~:@/])+)?;[Tt][Yy][Pp][Ee]=(?:[Ll](?:[Ii][Ss][Tt]|
[Ss][Uu][Bb])))|(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))
|[&=~:@/])+)(?:\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[
&=~:@/])+))?(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-
9]\d*)))?)|(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~
:@/])+)(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-9]\d*
)))?(?:/;[Uu][Ii][Dd]=(?:[1-9]\d*))(?:(?:/;[Ss][Ee][Cc][Tt][Ii][Oo][Nn
]=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~:@/])+)))?))
)?)|(?:nfs:(?:(?://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-
Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:
\.(?:\d+)){3}))(?::(?:\d+))?)(?:(?:/(?:(?:(?:(?:(?:[a-zA-Z\d\$\-_.!~*'
(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d\$\-_.!~*'(),
])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?)))?)|(?:/(?:(?:(?:(?:(?:[a-zA-Z\d
\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d\$\
-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?))|(?:(?:(?:(?:(?:[a-zA-
Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d
\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?)))
| [reply] [d/l] [select] |
|
I need to create an ASCII art regex. That is, a regex that works (but perhaps doesn't have a good purpose) that, when
viewed as an X-by-Y grid of characters, makes a cute picture. I swear I see something in your monstrous regex. Perhaps its my frayed ends of sanity.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
| [reply] |
Re: Regular Expressions: Call for Examples
by hossman (Prior) on Jul 21, 2002 at 18:31 UTC
|
this is from 180953 .. it pulls out
IMDB movie titles from the genre file...
next unless m{
^\> \s+ # starts with "> "
(.*? # main part of title ($1 = title)
\((\d+)(/.*?)?\) # year inside parens, might be (1999/I)
# ($2 = year, $3 = crap)
(\s+\(.*?\))? # ($4 = crap .. movies might be tv/vids/games)
)\t+Sci\-Fi$ # must end in Sc-Fi
}x;
I use it as an example when peope ask me what /x is for,
and when they ask what *? means.
(It could also be a good example of non-capturing
parens -- if it was changed to acctualy use them --
I sometimes prefer to capture and ignore unless needed
... one mans crap, is another mans treasure.) | [reply] [d/l] |
Re: Regular Expressions: Call for Examples
by mojotoad (Monsignor) on Jul 21, 2002 at 23:24 UTC
|
I'm pretty sure that japhy is aware of Regexp::Common by Damian Conway, but fellow monks may not be. It's a nice trove of commonly desired, but not necessarily simple, tricks of the regexen trade.
Matt
| [reply] |
|
| [reply] [d/l] |
|
| [reply] |
Re: Regular Expressions: Call for Examples
by VSarkiss (Monsignor) on Jul 21, 2002 at 23:53 UTC
|
Well, it's not mine, it's by Abigail-II, and it may be sneakier than you intended, but the regular expression that made my jaw drop recently was the n-queen problem solver in Backtracking through the regex world. I'm not sure of its pedagogic value for beginners, but it's certainly a mind-expander.
| [reply] |
Re: Regular Expressions: Call for Examples
by stefp (Vicar) on Jul 21, 2002 at 20:56 UTC
|
Can I throw a problem at you that I never took the time to think
out and that may add some salt to your book? In perl5, it is impossible to get at capture in an embedded
regular expression:
my $qr = qr|whatever(somecapture_re)whatever|;
my ($captured) = m/$qr/; # does not work
A few week ago, I looked at your draft (forgot the URL). It looks promising.
My problem is: find an API to stash away that
information and get it back. Probably, it will not very pretty
but there no way to "modularize" regexen in perl5.
I mean by that, to build interesting regexen from
simpler ones.
I once looked at your draft (forgot the URL), it seemed pretty interesting.
--
stefp -- check out TeXmacs
wiki | [reply] [d/l] |
|
my $string = "whateversomecapture_rewhatever";
my $qr = qr|whatever(somecapture_re)whatever|;
my ($captured) = $string =~ m/$qr/;
print $captured, " ", $1;
prints:
somecapture_re somecapture_re
| [reply] [d/l] [select] |
Re: Regular Expressions: Call for Examples
by dws (Chancellor) on Jul 22, 2002 at 03:40 UTC
|
Not simple dinky ones, but perhaps regexes that got you out of a bind, or were quite sneaky at what they did, ...
A couple of times recently I've used a "nested" regexp to pull off a bit of tricky substition. The "outer" regexp serves as a filter, and the "inner" regex, fired via /e, does a more targetted subsitution (or no substitution at all).
In the code below, the challenge was to turn words like "cowsCanFly" into "cows-can-fly".
$phrase = "NoMatch cowsCanFly sheepAreVeryCool NoMatch";
$phrase =~ s{
\b
(
[a-z]+
(?:[A-Z][a-z]+)+
)
\b
}{
my $word = $1;
$word =~ s/([A-Z])/"-" . lc($1)/eg;
$word;
}gex;
print $phrase, "\n";
When I first posted this fragment (in this node), there was some concern that the regex engine wasn't reentrant, and that I'd just gotten lucky. Perhaps, though I've done this a few times with 5.6.0 or later, and haven't run into any problems.
japhy, since you're now a regex UberLord, perhaps you can vet this approach for reentrancy issues.
| [reply] [d/l] |
|
There's no re-entry here. The regex engine has exited once the regex portion of the s/// ends. Once the right-hand side of the substitution is done, the regex engine starts again; it is not paused, though, it has stopped. Compare:
"japhy" =~ m{.(?{ "perlmonk" =~ /./ }).};
Watch it explode.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
| [reply] [d/l] |
|
That segfaults for me on 5.005_03, and 5.6.0, but not on
5.6.1 or 5.8.0. It seems to be fixed between 5.7.0 and
5.7.1. The former segfaults, the latter doesn't.
Abigail
| [reply] |
|
Re: Regular Expressions: Call for Examples
by Notromda (Pilgrim) on Jul 22, 2002 at 00:52 UTC
|
I needed to get some values out of a logfile, for some realtime reporting of spam blocked by our mail server.
Here's the regex:
if (/bouncer postfix\S+ reject: RCPT from (\S+) (530|554|450) (\S+): (.*) from=<(.*?)> to=<(.*?)>/) {
Here's what it was decoding:
Jul 3 11:19:00 bouncer postfix/smtpd[14071]: reject: RCPT from unknow
+n[123.123.123.12]: 530 <qwertyy@domain.tld>: Recipient address reject
+ed: Cannot find your hostname, [123.123.123.12]. Ask your system mana
+ger to fix your reverse domain name registration. If you are sending
+ spam, go away. ; from=<aaaaaaaaaaaaaaaaaaaaaaaaaa@aaaa.aaa-aaaaa.com
+> to=<qwertyy@domain.tld>
For monks not familiar with regex, here's a brief runthrough. First it looks for "bouncer postfix" and then some non-whitespace stuff, " reject: RCPT from ", more non-whitespace(and keep track of it), " ", one of ( 530,554,450 ) and keep track of it , " ", more non-whitespace(keep track of it, ": ", anything, "from=<", anything(keep track of it) non-greedy, "> to=<", anything(keep track of it) non-greedy, ">" In other words, from the example above, $1, $2 etc contain "unknown123.123.123.12:","530", "<qwertyy@domain.tld>", the error message, "aaaaaaaaaaaaaaaaaaaaaaaaaa@aaaa.aaa-aaaaa.com", "qwertyy@domain.tld" I'm not a very good teacher, but this might be a good real-world example of something a regex shines in. I'll let the book author explain it better. :) | [reply] [d/l] [select] |
Re: Regular Expressions: Call for Examples
by Cody Pendant (Prior) on Jul 22, 2002 at 06:45 UTC
|
I would nominate two regexes that merlyn was responsible for, one being the answer to "I want to replace spaces with underscores, but only where they're found between brackets" on a newsgroup posting somewhere.
It's a relatively simple one, but it opened my eyes, as a beginner, to a whole world of nested and executing regexes.
I'm reproducing it here from memory so merlyn will forgive me if it's not quite how he did it:
$str= 'no change <these spaces need replacing> not these <these do>';
$str =~ s{(<[^>]*?>)}
{
my $x=$1;
$x=~s/ /_/g;
$x;
}egx;
print $str
And the other one I can't remember at all, but I remember it involved Old MacDonald, and a regex that double-executed, and therefore ended "/eieio". Has to be included.
--
($_='jjjuuusssttt annootthheer
pppeeerrrlll haaaccckkeer')=~y/a-z//s;print;
| [reply] [d/l] [select] |
|
Do you mean this one?
$Old_MacDonald = q#print #;
$had_a_farm = (q-q:Just another Perl hacker,:-);
s/^/q[Sing it, boys and girls...],$Old_MacDonald.$had_a_farm/eieio;
As far as I can tell, the first (currently) recorded appearance of this can be found on this announcement regarding what I presume is the first edition of a certain book in most of our collections.
(Note: I've written as three lines because the two-line version wrapped oddly.)
--f
| [reply] [d/l] |
Re: Regular Expressions: Call for Examples
by I0 (Priest) on Jul 22, 2002 at 05:01 UTC
|
remove nested <table>...</table> elements $_ = join'',<>;
($re=$_)=~
s#((<table[^>]*>)|(</table>)|<!--.*?-->|.)#${['(','']}[!$2]\Q$1\E${[')
+','']}[!$3]#sgi;
$re=join"|",map quotemeta,eval{/$re/};
die $@ if $@=~/unmatched/i;
s/$re//g;
print;
| [reply] [d/l] [select] |
Re: Regular Expressions: Call for Examples
by smackdab (Pilgrim) on Jul 22, 2002 at 03:18 UTC
|
You could explain saving and restoring REs to a file...I had some help from the monks to get me going... | [reply] |
Re: Regular Expressions: Call for Examples
by PodMaster (Abbot) on Jul 23, 2002 at 21:16 UTC
|
sub untag {
local $_ = $_[0] || $_;
# ALGORITHM:
# find < ,
# comment <!-- ... -->,
# or comment <? ... ?> ,
# or one of the start tags which require correspond
# end tag plus all to end tag
# or if \s or ="
# then skip to next "
# else [^>]
# >
s{
< # open tag
(?: # open group (A)
(!--) | # comment (1) or
(\?) | # another comment (2) or
(?i: # open group (B) for /i
( TITLE | # one of start tags
SCRIPT | # for which
APPLET | # must be skipped
OBJECT | # all content
STYLE # to correspond
) # end tag (3)
) | # close group (B), or
([!/A-Za-z]) # one of these chars, remember in (4)
) # close group (A)
(?(4) # if previous case is (4)
(?: # open group (C)
(?! # and next is not : (D)
[\s=] # \s or "="
["`'] # with open quotes
) # close (D)
[^>] | # and not close tag or
[\s=] # \s or "=" with
`[^`]*` | # something in quotes ` or
[\s=] # \s or "=" with
'[^']*' | # something in quotes ' or
[\s=] # \s or "=" with
"[^"]*" # something in quotes "
)* # repeat (C) 0 or more times
| # else (if previous case is not (4))
.*? # minimum of any chars
) # end if previous char is (4)
(?(1) # if comment (1)
(?<=--) # wait for "--"
) # end if comment (1)
(?(2) # if another comment (2)
(?<=\?) # wait for "?"
) # end if another comment (2)
(?(3) # if one of tags-containers (3)
</ # wait for end
(?i:\3) # of this tag
(?:\s[^>]*)? # skip junk to ">"
) # end if (3)
> # tag closed
}{}gsx; # STRIP THIS TAG
return $_ ? $_ : "";
}
____________________________________________________ ** The Third rule of perl club is a statement of fact: pod is sexy. | [reply] [d/l] |
Re: Regular Expressions: Call for Examples
by stefp (Vicar) on Jul 27, 2002 at 17:49 UTC
|
| [reply] |
|
|