Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Learning Perl's Regular Expressions

by japhy (Canon)
on May 07, 2001 at 01:37 UTC ( [id://78396] : note . print w/replies, xml ) Need Help??

in reply to japhy and mystery

Here is my vision. Please entertain it, if you would. If you haven't been able to tell, I'm serious, and I'm a regex fanatic. Learning Perl's Regular Expressions
  • Familiarize the reader with the tasks that regular expressions are intended to complete.
    • What is a regular expression (regex)?
    • What can they do?
    • What are some practical uses for them?
    • What are some impractical uses for them?
    • How can they solve daily programming problems?
  • Abstract thinking about regexes.
    • How to explain a regex's purpose.
    • How to tackle a given parsing problem with a regex.
    • How to say what you mean, and mean what you say.
    • What else is out there, apart from regexes.
  • Simple patterns.
    • Plain text: /hello world/
    • Safe patterns: /hello \Q$place\E/
    • Character classes: /[Hh]ello [Ww]orld/
    • Case-insensitivity: /hello world/i
    • Alternation: /hello|goodbye/
    • Matching something other than $_: $foo =~ /hello/
  • More "patterny" patterns.
    • Dot: /h.ll/
    • Quantifiers: /hello(?: +world)?/
    • Macros: /hello\s+world/
    • Anchors: /^hello,?\s+world$/
    • Alternate m// characters: m!a/b/c!
  • Getting something back.
    • Capturing: /hello,?\s+(\w+)/
    • Back-references: /\b(\w)\w+(\1)\b/
    • Global matching: /(\d+)/g
    • Regex-y functions: split(), grep(), map()
    • Useful variables: $1, @-, @+
    • Not-so-useful variables: $`, $&, $'
  • More modifiers.
    • Modifying .: /a.b/s
    • Modifying ^ and $: /^foo$/m
    • Explaining your regex using /x
    • Using qr// and /o
  • Breakpoint: greediness and backtracking
    • What is backtracking, and how does it work?
    • What is greediness, and how do you avoid it?
    • Minimal (lazy) vs. maximal (greedy): /a\w+?z/
    • Understanding the "left-most longest" rule.
  • Substitution.
    • Using s///.
    • Calling functions with /e.
  • Simple assertions.
    • Positive look-ahead: /foo(?=bar)/
    • Negative look-ahead: /foo(?!bar)/
    • Common traps: "aaabar" =~ /a+(?!bar)/
  • More assertions.
    • Positive look-behind: /(?<=foo)bar/
    • Negative look-behind: /(?<!foo)bar/
    • The constant-width assertion plague: /(?<=ab+a)/
    • The cut assertion: /(?>a+)a/
  • Logic-flow in regexes.
    • The conditional assertion: /(?(...)a|b)/
  • Embedding code.
    • Advanced use of the /e modifier.
    • The evaluate assertion: /(?{ code })/
    • The delayed regex assertion: /(??{ regex })/
  • Advanced global searching.
    • The pos() function
    • The \G anchor and the /gc modifiers.
  • Common mistakes.
    • $&, et. al.
    • Problems with /$foo/.
    • Matching too much.
    • Wasteful modifiers.
  • Optimizing your regexes.
    • "Death to .*."
    • Match only what you want.
    • Unrolling the loop.
  • Concrete applications of regexes.
    • Data transformation for sorting.
    • Matching in reverse ("sexeger").
  • Regex tools and resources.
Please give me your input. Is this too much for a tutorial? I want it to be in the form of Learning Perl, with exercises and such, but I want it to include as much as it should so that one has a firm grasp on Perl's regexes.

japhy -- Perl and Regex Hacker

Replies are listed 'Best First'.
Re: Learning Perl's Regular Expressions
by chromatic (Archbishop) on May 07, 2001 at 09:00 UTC
    I'd add "When A Regex Is Too Much", covering tr(), and "Other Places For a Regex" which would discuss split and grep.

    I use those more than substitutions and matches in plenty of programs.

      I'm definitely going to cover tr/// in the "optimization" section, showing how to change something as silly-looking as s/[aeiou]//gi to the far more competant tr/aeiouAEIOU//d.

      Don't fear; I'll be discussing functions that are oft or best employed with regexes, especially split() and its magical first argument.

      japhy -- Perl and Regex Hacker

      Edit: chipmunk 2001-05-07

Re: Learning Perl's Regular Expressions
by dws (Chancellor) on May 07, 2001 at 01:48 UTC
    Good outline. It layers on the knowledge and complexity nicely.

    You may be thinking of covering this under "Common Mistakes".

    • Pre-match, post-match. What they do and why to avoid them.
Re: Learning Perl's Regular Expressions
by sharle (Acolyte) on May 07, 2001 at 04:22 UTC
    I think that this is a pretty good idea. It would probably teach the basics and focus on the initial learner of regexes better than anything else I've seen on the subject. Being fairly new to Perl, I would appreciate seeing something on the subject that was more advanced than the O'Reilly books, without going into the complexity of the Hip Owl book. I would be available to proof and test from a newbie perspective, if you'd like.
Re: Learning Perl's Regular Expressions
by MeowChow (Vicar) on May 07, 2001 at 23:03 UTC
    Perhaps a section on regex limitations - what they can't do, problems for which they are poorly suited or simply overkill, and a good explanation of why this is so, as well as brief pointers to the right tools for these sorts of tasks (eg. Parse::RecDescent, Parse::YAPP, HTML::TreeBuilder) would also be a good idea. Best of luck =)
                   s aamecha.s a..a\u$&owag.print
(tye)Re: Learning Perl's Regular Expressions
by tye (Sage) on May 07, 2001 at 20:17 UTC

    I'd love to see my oft-repeated rant addressed early on:

    • Simple patterns.
      • Plain text: /hello world/, /\Q$str\E/

            - tye (but my friends call me "Tye")
      I'll be discussing the use of regexes for simple things like /foo/. I'll mention quotemeta() and \Q...\E, of course, and suggest that something like /\Q$str\E/ be rewritten without using a regex.

      I will also discuss the dangers of /$str/.

      japhy -- Perl and Regex Hacker
Re: Learning Perl's Regular Expressions
by birdbrane (Chaplain) on May 07, 2001 at 22:46 UTC

    Sign me up! Kind of weird that you posted this, I was just at B&N the other day looking for the "Mastering Reg Ex" book. They didn't have that one, or anything else similar.

    Personally, I get the most out of these types of books when there are a fair number of examples and problems that allow me to test out the theory.

    Japhy: Is this too much for a tutorial?

    Personally, I think the type of person that will purchase this book will want a beefy tutorial. I know the basics of regex, but want to know them a lot better. I don't want a warm and fuzzy overview.

    Ready w/ cc in hand,