Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Parsing a Log File

by PrimeLord (Pilgrim)
on Nov 29, 2004 at 18:41 UTC ( [id://410999]=perlquestion: print w/replies, xml ) Need Help??

PrimeLord has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks. I am here seeking your wisdom once again. I am trying to develop a program to read a log file generated by a football sim. I would like the program to compile a stastical breakdown for each player. Things like per quarter and per half stats as well as down and distance stats etc. I am having trouble determining the best way to attack this though. Below is an excerpt from one of the log files.

Beginning of First Quarter. ORL 13 Sauls kicked off 71 yards from the ORL30. ASH 22 Noa returned the ball 14 yards to the ASH13. Tackled by ORL +82 Ferguson. Possession to Asheville. 1-10-ASH13 (14:30) ASH 48 Hopper ran around right end for 1 yard. Tackled by ORL 94 Whiting, assisted by ORL 54 Schacht. 2-09-ASH14 (13:59) ASH 23 Theriot ran inside the left guard for 1 + yard. Tackled by ORL 96 Dugger. 3-08-ASH15 (13:25) ASH 18 Hall pass completed to 23 Theriot for 1 +0 yards. Tackled by ORL 96 Dugger. -- 1-10-ASH25 (12:50) ASH 38 Dollinger ran around right end for 6 ya +rds. Tackled by ORL 95 Gonzalez, assisted by ORL 54 Schacht. Key block delivered by ASH 77 Cassell. 2-04-ASH31 (12:05) ASH 18 Hall pass was overthrown, intended for +81 Perry. Penalty: ORL - Offsides. --


I need to parse the lines to determine what type of play it was, who was involved, what the down and distance was etc. That isn't too difficult however the lines are not always consistent. As you can see some of the lines have additional information than just the play result and who made the tackle, such as who had a key run block. There are also penalties to deal with and the -- indicates a first down.

Does anyone have any suggestions on the best way to parse some thing like this? Also what type of data structures should I store the information in. When I had tried to tackle this before I tried slurping all the 1st quarter lines into an array and then the second quarter lines into another array etc as well as creating an array for home games and away games etc, but that didn't seem efficent. Is there a good way using hash of hashes or arrays in hashes to store this information for each player?

I apologize if this isn't very clear, but I am just lost as to what the best way to tackle this is. I can try and clarify any thing that I haven't explained well. Any help you can offer would be greatly appreciated.

Thanks!

-Prime

Replies are listed 'Best First'.
Re: Parsing a Log File
by BrowserUk (Patriarch) on Nov 29, 2004 at 19:28 UTC

    This might form a starting point. You'll to tweak the regexes in the light of what the rest of the logs look like and add extra if you want to record timings, offsides etc.

    #! perl -slw use strict; use Data::Dumper; #local $/ = ''; my %stats; while( $_ = <DATA> ) { chomp; $stats{$1}{$2}{$3} += $4 if m[ ([A-Z]{3}) \s+ # Team ( \d{2}\s[A-Za-z]+ ) \s+ # Player (ran|kicked|pass completed|returned) # Action .*? (\d+)\syards? ]x; $stats{$2}{$3}{$1}++ if m[ (Tackled|assisted|block) #action .*? by\s+([A-Z]{3}) \s+ # Team ( \d{2}\s[A-Za-z]+ ) # Player ]x; } print Dumper \%stats; __DATA__ Beginning of First Quarter. ORL 13 Sauls kicked off 71 yards from the ORL30. ASH 22 Noa returned the ball 14 yards to the ASH13. Tackled by ORL 82 Ferguson. Possession to Asheville. 1-10-ASH13 (14:30) ASH 48 Hopper ran around right end for 1 yard. Tackled by ORL 94 Whiting, assisted by ORL 54 Schacht. 2-09-ASH14 (13:59) ASH 23 Theriot ran inside the left guard for 1 ya +rd. Tackled by ORL 96 Dugger. 3-08-ASH15 (13:25) ASH 18 Hall pass completed to 23 Theriot for 10 y +ards. Tackled by ORL 96 Dugger. -- 1-10-ASH25 (12:50) ASH 38 Dollinger ran around right end for 6 yards +. Tackled by ORL 95 Gonzalez, assisted by ORL 54 Schacht. Key block delivered by ASH 77 Cassell. 2-04-ASH31 (12:05) ASH 18 Hall pass was overthrown, intended for 81 +Perry. Penalty: ORL - Offsides. --

    Output

    [19:23:58.26] P:\test>410999 $VAR1 = { 'ASH' => { '77 Cassell' => { 'block' => 1 }, '38 Dollinger' => { 'ran' => '6' }, '23 Theriot' => { 'ran' => '1' }, '48 Hopper' => { 'ran' => '1' }, '22 Noa' => { 'returned' => '14' } }, 'ORL' => { '95 Gonzalez' => { 'Tackled' => 1 }, '94 Whiting' => { 'Tackled' => 1 }, '96 Dugger' => { 'Tackled' => 2 }, '82 Ferguson' => { 'Tackled' => 1 }, '13 Sauls' => { 'kicked' => '71' } } };

    Examine what is said, not who speaks.
    "But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
    "Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo         "Efficiency is intelligent laziness." -David Dunham
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: Parsing a Log File
by ikegami (Patriarch) on Nov 29, 2004 at 18:54 UTC

    Start by defining what you want for output. I presume you want some kind of data structure? What would the data structure be for the snippet you provided? How about you modify the following to be what you want? Then we can help you write the parser. Or maybe you'll be able to do it on your own.

    [ # First quarter [ 'ORL 13 Sauls kicked off 71 yards from the ORL30.', 'ASH 22 Noa returned the ball 14 yards to the ASH13. Tackled +by ORL 82 Ferguson.' ], 'Possession to Asheville.', [ # First dashes [ 10, 'ASH13', '14:30', [ 'ASH 48 Hopper ran around right end for 1 yard.', 'Tackled by ORL 94 Whiting, assisted by ORL 54 Schacht. +', ], ], [ 9, 'ASH14', '13:59', [ 'ASH 23 Theriot ran inside the left guard for 1 yard.', 'Tackled by ORL 96 Dugger.', ] ], [ 8, 'ASH15', '13:25', [ 'ASH 18 Hall pass completed to 23 Theriot for 10 yards. +', 'Tackled by ORL 96 Dugger.', ] ] ], [ # Second dashes [ 10, 'ASH25', '12:50', [ 'ASH 38 Dollinger ran around right end for 6 yards.', 'Tackled by ORL 95 Gonzalez, assisted by ORL 54 Schacht +.', 'Key block delivered by ASH 77 Cassell.', ] ], [ 4, 'ASH31', '12:05', [ 'ASH 18 Hall pass was overthrown, intended for 81 Perry +.', 'Penalty: ORL - Offsides.', ] ] ] ]

    Keep in mind some of us (incl myself) don't know football.

      I am looking to produce a report that breaks down the statistics for the various players in different situations. For example lets say this snippet was the entire game. I need to know which quarter the events are taking place in which is clearly defined at the top of the file here. I am not concerned about the kick off action so the first line to parse would be:

      1-10-ASH13 (14:30) ASH 48 Hopper ran around right end for 1 yard. Tackled by ORL 94 Whiting, assisted by ORL 54 Schacht.


      That line says on 1st and 10 at the Ashveille 13 yard line 48 Hopper ran the ball 1 yard. Now in this game Aheville is the home team so I would essentially want take that information and put it into several "buckets" for the player 48 Hopper. So for example fomr this line we would add information into a "home game bucket" that said he ran once for 1 yard. We would also add that information into a bucket for the first quarter, first half, and 1st down.

      For simplicity sakes lets say the next line, which was also a run, was run by Hopper. Then I would just add that information into the home game, 1st quarter, and 1st half buckets, but would have to now drop the new information into a 2nd down bucket since it was on second down.

      Is that making it any clearer? I have to deal with penalties which can negate the previous play. The final report would look something like this.

      48 Hopper Situation Att Yards Total 2 2 Home 2 2 Away 0 0 1st Quarter 2 2 2nd Quarter 0 0 3rd Quarter 0 0 4th Quarter 0 0 1st down 1 1 2nd down 1 1 3rd down 0 0 4th down 0 0


      That is a pretty simplified version of what I am lookig for, but I hope that makes it a bit clearer.

      -Prime
Re: Parsing a Log File
by tall_man (Parson) on Nov 29, 2004 at 19:25 UTC
    This could be done using Parse::RecDescent. For example, here is a simple, stripped-down grammar for part of a play. I used 'autotree' so you would get a parse tree automatically. You can either write code to work with the parse tree itself, or define your own actions for each grammar rule so that the play information can go into a hash for each player.
    use strict; use Parse::RecDescent; use Data::Dumper; # ORL 13 Sauls kicked off 71 yards from the ORL30 my $grammar = q { <autotree> file : play(s) play : side number player action action : kick | run kick : 'kicked off' number 'yards from the' sideyard run : 'ran' number 'yards' side : /\w+/ player : /\w+/ number : /\d+/ sideyard : /\w+\d+/ }; my $parser = new Parse::RecDescent ($grammar); my $ret = $parser->file("ORL 13 Sauls kicked off 71 yards from the ORL +30"); print Data::Dumper->Dump([$ret], [qw(ret)]);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://410999]
Approved by NovMonk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2024-04-20 09:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found