in reply to Speeding up a large regex match pattern

You may need nested grouping toward the content based on the end of the spec, but at least the early part of a METAR should give you no concern.

The following approach embodies the advice (brilliant minds come to similar solutions?) re breaking the METAR or SPECI down into tokens:

#!/usr/bin/perl -lw use strict; my $metar = 'METAR CCCC 210855Z AUTO dddff(f)Gfmfm(fm)KTv dndndnVdxdxd +x VVVVVSM [RDRDR/VRVRVRVRFT ] w\'w\' [NsNsNshshshs or VVhshshs T\'T\' +/T\'dT\'d APHPHPHPH'; if ( $metar =~ /^(METAR|SPECI)\s([A-Z]{4})\s(\d\d)(\d{4}Z)\s(AUTO|COR) +.*/) { my $type= $1; my $stn= $2; my $monthday = $3; my $obsTime= $4; my +$subtype= $5; print $type . " Station: $stn, Date: $monthday, ObservationTime: $ +obsTime, Subtype: $subtype"; } else { print "ww screwed up the regex."; }
($metar is taken straight from the ref doc, with digits substituted for YYGGgg and with the ticks escaped, as comprehensively understanding the spec would require more time than I have to give, just now.)

Replies are listed 'Best First'.
Re^2: Speeding up a large regex match pattern
by wossname (Novice) on Feb 04, 2009 at 21:47 UTC
    The METAR spec document seems to use a fairly peculiar method of notation (at least to my eyes) to describe the various fields, most of which are optional.

    It is a very challenging pattern to match though, no doubt about that.

    One thing that does strike me is that it would appear that many parts of the METAR encoding could be greatly improved by reversing the order of some of the fragments of information.

    I had some significant difficulty with matching the visibility group "VVVVVSM". The most obvious optimisation would be to move the "SM" to the start of the group, not the end. Ho hum.

    Cheers.