in reply to My code seems messy...

If you want to go beyond davidrw's fine suggestions, I would recommend you take a look at TheDamian's book Perl Best Practices. It has excellent suggestions for writing readable and maintainable code.

Replies are listed 'Best First'.
Re^2: My code seems messy...
by PerlGrrl (Sexton) on May 14, 2006 at 12:09 UTC
    My text files include dates in string fromat such as the following:
    ...the next social club meeting is on April 15, 1994...

    ...September 21-23, 1994 we will be hosting visitors...

    ...submissions should be made by 11 February 1994...

    ...Mail sent 7 Feb 1994...

    ...On the 16th September 1994 Mr X will be giving a talk on ...

    ...unconfirmed conference dates are March 4, 5 and 6, 1994...

    (...is MY emphasis - for clarity's sake)
    Have absolutely no knowledge of surrounding boundaries, nor where the strings can occur in the text, just got to pull them all out, so that I can then do some date normalisation...also, i've found that some of my strings are sometimes matching numeric strings...

      Here's my attempt. I've used two regexes for the general cases of day month and month day.

      #!/usr/bin/perl use strict; use warnings; while (<DATA>){ if ( # month day / ( [JFMASOND][a-z]{2,8} # full or 3 letter month name \s [123]?\d # day number (?:-[123]?\d)? # optional dash and day number (?:,\s[123]?\d\s)* # optional list of day numbers (?:and\s)? # optional and (?:[123]?\d,?\s)? # optional end of list (?:19|20)\d{2} # year starting 19 or 20 ) /x ) { print "1: $1\n"; } elsif ( # day month / ( [123]?\d # day number (?:st|nd|rd|th)? # optional st, nd etc. \s+ [JFMASOND][a-z]{2,8} # month name \s+ (?:19|20)\d{2} # year starting 19 or 20 ) /x ) { print "2: $1\n"; } } __DATA__ ...the next social club meeting is on April 15, 1994... ...September 21-23, 1994 we will be hosting visitors... ...submissions should be made by 11 February 1994... ...Mail sent 7 Feb 1994... ...On the 16th September 1994 Mr X will be giving a talk on ... ...unconfirmed conference dates are March 4, 5 and 6, 1994...
      output:
      ---------- Capture Output ---------- > "C:\Perl\bin\perl.exe" _new.pl 1: April 15, 1994 1: September 21-23, 1994 2: 11 February 1994 2: 7 Feb 1994 2: 16th September 1994 1: March 4, 5 and 6, 1994 > Terminated with exit code 0.
      Here I start by checking if there's a valid month name in the string. If there is, I start extracting the dates from the string; else skipt that line.
      Note that I had to add a space after the first '...' because '...September' is not a valid month name.
      Updated to return valid dates
      #!/usr/bin/perl # filename: extract_dates.pl use strict; use warnings; use Data::Dumper; $Data::Dumper::Indent = 1; my @dates; push @dates, findMonth($_) while (<DATA>); print Dumper \@dates; # or do something else with your dates sub findMonth { my @words = split / /,shift; my %months = map {$_ => 1 } qw/ january jan february feb march mar april apr may may june jun july jul august aug september sep october oct november nov december dec /; foreach (@words) { if (exists $months{lc($_)} ) { return extractDate( { MONTH => $_, STRING => "@words" } ); last; } } return; } sub extractDate { my $self = shift; return makeValidDate( {STRING=>$self->{STRING},DATE=>$1} ) if ($self->{STRING} =~ / ( (?: ( [123]?\d (st|nd|rd|th)? \s+ )? $self->{MONTH} \s\d{1,4} ( (-[123]?\d)? (,\s[123]?\d\s)* (and\s\d+)? (,\s\d{4})? )? ) ) /x ); return; } sub makeValidDate { my $self = shift; my ($string) = $self->{STRING} =~/^(.+)$/; $self->{DATE} =~s/-/X/g; $self->{DATE} =~s/(^-|\W|and|th|st|nd|rd)/ /gi; my @date = split /\s+/,$self->{DATE}; my $date = {}; foreach (@date) { if ($_=~/^\d{1,2}$/) { push @{$date->{days}},$_ } elsif ($_=~/^\d{4}$/) { $date->{year} = $_ } elsif ($_=~/(\d+)X(\d+)/) { push @{$date->{days}},$1 .. $2 } else { $date->{month} = lc($_) } } use Date::Manip; my $out_date = {}; foreach (@{$date->{days}}) { my $temp_date = ParseDate( $_ . " " . $date->{month} . " " . $date->{year} ); $temp_date = &UnixDate($temp_date,"%D"); push @{$out_date->{dates}},$temp_date; $out_date->{string} = $string; } return $out_date; } __DATA__ ... the next social club meeting is on April 15, 1994... ... September 21-23, 1994 we will be hosting visitors... ... submissions should be made by 11 February 1994... ... Mail sent 7 Feb 1994... ... On the 16th September 1994 Mr X will be giving a talk on ... ... unconfirmed conference dates are March 4, 5 and 6, 1994...
      $VAR1 = [ { 'dates' => [ '04/15/94' ], 'string' => '... the next social club meeting is on April 15, 1994 +...' }, { 'dates' => [ '09/21/94', '09/22/94', '09/23/94' ], 'string' => '... September 21-23, 1994 we will be hosting visitors +...' }, { 'dates' => [ '02/11/94' ], 'string' => '... submissions should be made by 11 February 1994... +' }, { 'dates' => [ '02/07/94' ], 'string' => '... Mail sent 7 Feb 1994...' }, { 'dates' => [ '09/16/94' ], 'string' => '... On the 16th September 1994 Mr X will be giving a +talk on ...' }, { 'dates' => [ '03/04/94', '03/05/94', '03/06/94' ], 'string' => '... unconfirmed conference dates are March 4, 5 and 6 +, 1994...' } ];