in reply to U.S. State Names
Here is the Benchmark of the three routines\G # when progressively matching a string # with the 'g' flag # you can use the \G anchor to 'hold' the postition # just after the previous match # helps regex remember where it left off # allows you to go through a list efficiently # without using split or looping # Mastering Regular Expessions (p 236 - 240) \s* # matches zero or more spaces that # may come before before a name-value pair (\w\w) # match two word characters (alphanumeric plus '_') # parentheses assign matched letters to $1 # this is the state abbreviation \s+ # match one or more spaces between name-value pair (\w+ # match one or more word characters (?: # ?: allows for cluster-only parentheses, # no capturing and doesn't assign to $3 \s\w+ # match one space then one or many word characters )? # match zero or one of these clusters # allows match of state names with mulitple words # ie New York, West Viginia # does not match States with three words, # like 'Northern Mariana Island' # change trailing ? to * to match those '(?:\s\w+)*' ) # assigns state name to $2 /gx; # end of regex # g flag for global search # x flag to allow whitespace in regex # might also want to use c flag # c flag causes the match position to be retained # following an unsuccesful match # see: Effective Perl Programming (p.63) # # the complete regex looks like this: # # /\G\s*(\w\w)\s+(\w+(?:\s\w+)?)/g; #
Also here is a link to FIPS and ISO 3166 country codes in case anybody wants to apply this snippet to countries.Benchmark: timing 1000 iterations of merlyn, tye_1, tye_2... merlyn: 0 secs ( 0.54 usr + 0.00 sys = 0.54 CPU) @ 1851.85/s (n=1000) tye_1: 1 secs ( 0.65 usr + 0.00 sys = 0.65 CPU) @ 1538.46/s (n=1000) tye_2: 2 secs ( 1.58 usr + 0.00 sys = 1.58 CPU) @ 632.91/s (n=1000)
|
|---|