GreatWhite has asked for the wisdom of the Perl Monks concerning the following question:

I have a project I have been working on. I want to find a month (Dec or December...) in a line and take that and the next two fields (Dec 11 1999) and write them to another file. I have everything working, except I can't seem to get the month search part right. How would I go about finding the month part?

Replies are listed 'Best First'.
Re: Pulling Date out of String
by VSarkiss (Monsignor) on Jun 30, 2001 at 04:00 UTC

    A lot of guessing, but it sounds like you're looking for something like this:

    my ($month, $day, $year) = $line =~ / (Jan|Mar|Dec) # the month abbreviations we want \s+ # followed by one or more spaces (\d+) # then one or more digits \s+ # then one or more spaces (\d+) # then one or more digits /x;
    To clarify: I'm not sure if you're always looking for the same month, an abbreviation, a series of months, or a series of month abbreviations. This little example will work for a series of month abbreviations. Substitute whatever you need for the (Jan|Mar|Dec). (Keep the parentheses).

    Next, I'm assuming that by "fields" you mean space separators for the month and year, which always appear in that order, as numbers. If you meant something else by "fields", you need to be more specific. This statement will assign those three parts to the three variables $month, $day, and $year. (Assuming your input string is in $line.)

    If you haven't seen it before, the /x at the end of the pattern allows embedding comments and whitespace. It increases readability a lot.

    As I said, there's a lot of guessing here on my part. If you want to be more general (for example, being able to handle more date formats), take a look at the many date-handling routines on CPAN.

    HTH

      Thank you for the code. I have been able to get my script to work. Thanks again- GreatWhite "The only true wisdom is in knowing you know nothing." -Socrates
Re: Pulling Date out of String
by tachyon (Chancellor) on Jun 30, 2001 at 15:30 UTC

    Nice regex V you can write this as well:

    $line = "On January 1 1970 the unix epoch commenced."; my @months = qw( January Jan February Feb March Mar April Apr May June Jun July Jul August Aug September Sept Sep October Oct November Nov December Dec ); my $months = join "|", @months; my ($month, $day, $year) = $line =~ m/($months)\s+(\d+)\s+(\d+)/i; print "month\t$month\nday \t$day\nyear\t$year\n";

    Logic is as explained by V. I have built up the month alternation from a list of months and abreviations stored in the array @months and joined with the | to allow interpolation into the regex . I have made the regex case insensitive with the /i switch. Note he used the /x switch to allow the informative comments in his regex. This also works and captures full or abbreviated months, case insensitive by allowing any sting of chars after the first theee which identify the month:

    $line = "On January 1 1970 the unix epoch commenced."; my @months = qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ); my $months = join "|", @months; my ($month, $day, $year) = $line =~ m/((?:$months)\w+)\s+(\d+)\s+(\d+) +/i; print "month\t$month\nday \t$day\nyear\t$year\n";

    cheers

    tachyon