smist has asked for the wisdom of the Perl Monks concerning the following question:
|
|---|
| Replies are listed 'Best First'. | ||
|---|---|---|
|
Re: parsing question
by BrowserUk (Patriarch) on Dec 07, 2006 at 04:01 UTC | ||
As the file is in several distinct sections each with it's own format, first break the file into those sections. As it's also a fairly small file, slurp it and split it on the section separator:
Once you have the sections separated, you can treat each one differently. Rather than having to count all the spaces and manually construct the unpack formats for dealing with them--which is a PIA and they also might change--notice that each section of stats is preceeded by a header line, and that the left hand edge of the column headers forms a left edge limit for the data in the columns. Also notice that although some of the column titles are multiple words, every title is preceeded by at least two spaces, whilst the multi-word titles themselves contain only single spaces. That information allows you to use the header lines to construct the unpack formats programmically. The following subroutine takes a header line, uses it to discover the column boundaries and then uses those to construct a format:
This can be reused for all the columnised (sub)sections in the file. By way of example, this is how to use it break down the four subsections of the 'PLAYER GAME STATISTICS':
Parsing the other sections (with columnised data), is just a repeat of the above. The code all together as far as I've taken it: Read more... (2 kB)
The output from section 2 debugging lines:
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] | |
by smist (Acolyte) on Dec 07, 2006 at 13:44 UTC | ||
| [reply] [d/l] | |
by joeface (Pilgrim) on Dec 07, 2006 at 16:20 UTC | ||
This sets the default line-separator (held in "$/", normally set to "\n") to undef, so that whatever comes in from STDIN (<>) will get slurped up, split on the string "-{103}", and assigned to @sections. A short example... put the following into a script named "test.pl":
Then, put the following into a file called "test.txt":
Then, run the following at the command-line:
| [reply] [d/l] [select] | |
by smist (Acolyte) on Dec 07, 2006 at 17:42 UTC | ||
|
Re: parsing question
by ikegami (Patriarch) on Dec 07, 2006 at 02:54 UTC | ||
| [reply] [d/l] [select] | |
|
Re: parsing question
by grep (Monsignor) on Dec 07, 2006 at 03:16 UTC | ||
| [reply] [d/l] | |
by ikegami (Patriarch) on Dec 07, 2006 at 06:54 UTC | ||
Copyright doesn't protect statistics. It does copy documents containing statistics (such as the linked text file), but we're not making a copy of the text file[*]. There could be contractual (licensing, Terms of Service) issues, but no Copyright issues. * — Well, technically the file is copied numerous times (e.g. from the TCP stream into RAM), but Copyright allows those copies since they are necessary steps to viewing the file. | [reply] | |
|
Re: parsing question
by wjw (Priest) on Dec 07, 2006 at 03:36 UTC | ||
...the majority is always wrong, and always the last to know about it... | [reply] | |
by smist (Acolyte) on Dec 07, 2006 at 10:47 UTC | ||
| [reply] | |
|
[OT]Re: parsing question
by japhy (Canon) on Dec 07, 2006 at 05:08 UTC | ||
| [reply] | |
by smist (Acolyte) on Dec 07, 2006 at 10:43 UTC | ||
| [reply] | |