Regular Expression help

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear brethren,
I am fairly new to perl and am having difficulty parsing a logfile that does NOT have a common delimiter. A sample from the logfile looks like this:

10/1/2003 2:06:32 AM|1|Checkout Started for US-02-14@comany.com (requested by aad6870)
10/2/2003 2:07:17 AM|1|Checkout Processed for US-02-14@company.com (requested by aad6870)
10/3/2003 2:09:37 AM|1|Checkin Processed for DN-US-02-14@company.com (requested by aad6870)
10/4/2003 9:37:53 AM|1|Checkout Started for DN-US-02-14@company.com (requested by heavis6608)
10/5/2003 9:38:29 PM|1|Checkout Processed for US-02-14@company.com (requested by heavis6608)
10/6/2003 10:10:21 AM|1|Checkout Started for US-02-17@company.com (requested by vm_karthik3521)

I need to parse out the date stamp, the activity 'Checkout Started', the machine 'US-02-14@company.com', and the username 'aad6870'. What I've started on is the following:

while (<LOGFILE1>) 
{
    
    # parse the date field with the '|' delimiter
        @dateField = split(/\|/);
    
        # parse the activity field by matching '|1|'word space word
        @activityField = split(/\|1\|\w\s\w/);

    # Start populating the fields as appropriate
    $date = $dateField[0];
    $activity = $activityField[0];
        
    ## Write the cleaned up data to the data file
    print DATAFILE "$date , ";
    print DATAFILE "$activity ,";
    print DATAFILE "\n";

}
[download]

My problem is that the parsing of the activity field returns me the entire row. Can anyone tell me what is wrong with my regular expression for the activifity field?
Am I heading in the right direction or have the I been possessed by Perl gremlins...
Any help or suggestions greatly appreciated....

Comment on Regular Expression help Download Code

Replies are listed 'Best First'.
Re: Regular Expression help by Roger (Parson) on Nov 11, 2003 at 23:39 UTC
Perhaps you are looking for something like this instead? :) use strict; use Data::Dumper; while (<DATA>) { chomp; # remove trailing \n, optional my @rec = split /\\|/; # split records my $date = $rec[0]; my ($activity, $machine, $requester) = $rec[2] =~ /(.)\sfor\s(.)\s$requested by (.)$/; print "$date, $activity, $machine, $requester\n"; } __DATA__ 10/1/2003 2:06:32 AM\|1\|Checkout Started for US-02-14@comany.com (reque +sted by aad6870) 10/2/2003 2:07:17 AM\|1\|Checkout Processed for US-02-14@company.com (re +quested by aad6870) 10/3/2003 2:09:37 AM\|1\|Checkin Processed for DN-US-02-14@company.com ( +requested by aad6870) 10/4/2003 9:37:53 AM\|1\|Checkout Started for DN-US-02-14@company.com (r +equested by heavis6608) 10/5/2003 9:38:29 PM\|1\|Checkout Processed for US-02-14@company.com (re +quested by heavis6608) 10/6/2003 10:10:21 AM\|1\|Checkout Started for US-02-17@company.com (req +uested by vm_karthik3521) [download] And the output is - `10/1/2003 2:06:32 AM, Checkout Started, US-02-14@comany.com, aad6870 10/2/2003 2:07:17 AM, Checkout Processed, US-02-14@company.com, aad687 +0 10/3/2003 2:09:37 AM, Checkin Processed, DN-US-02-14@company.com, aad6 +870 10/4/2003 9:37:53 AM, Checkout Started, DN-US-02-14@company.com, heavi +s6608 10/5/2003 9:38:29 PM, Checkout Processed, US-02-14@company.com, heavis +6608 10/6/2003 10:10:21 AM, Checkout Started, US-02-17@company.com, vm_kart +hik3521` [download] I think the only trick here is with the `my ($var) = $str =~ /(.)/;` idiom. Which is a handy one to master. tachyon had a Perl Meditation not long ago on this topic... 291543 Also you don't need to print the elements one line at a time, you can print them all at once.	[reply] [d/l] [select]
Re: Re: Regular Expression help by Anonymous Monk on Nov 12, 2003 at 00:54 UTC
Hi Roger, Thanks for the help, this really deepens my understanding of regular expressions. I wonder could you tell me how I could parse it without stripping the date out first with the /\\|/ split. In other words is there a way to get a record to parse without splitting?? Once again thanks for the help.	[reply]
Re: Re: Re: Regular Expression help by Roger (Parson) on Nov 12, 2003 at 01:06 UTC
Ok, you can change my previous code to - `my ($activity, $machine, $requester) = /\\|1\\|(.)\sfor\s(.)\s$reques +ted by (.*)$/;` [download] I have omitted the implicit `$_ =~` part in the idiom. What the new code does is to look for the \|1\| pattern followed by the stuff you are looking for. Note that at this point, $_ holds the entire line.	[reply] [d/l] [select]
Re: Regular Expression help by davido (Cardinal) on Nov 12, 2003 at 05:44 UTC
This may seem a little funky, and it is somewhat dependant on what you want to allow as machine and user names. But this should give you something to work with. `my ( $date, $activity, $machine, $company ) = $_ =~ m/^(.+(?:AM\|PM)) \\|\d\\| (.+) \sfor\s ([\w\d-]+@[\w\d.-]) \s.+\bby\s ([\w\d]) [^\w\d]+$ /x;` [download] I used the /x modifier to allow whitespace within the RE, so that I could group it in segments that each accomplish a different portion of the match. You probably ought to also have a look at perlre and perlretut, as well as perlfaq6. They will go a long way toward giving you a good comfort level with RE's. Dave "If I had my life to live over again, I'd be a plumber." -- Albert Einstein	[reply] [d/l]
Re: Regular Expression help by ysth (Canon) on Nov 11, 2003 at 23:34 UTC
`\w` matches a word character, not a word. Repeat it like `\w+`	[reply] [d/l] [select]