Not perfect, but its a good start
use strict;
use warnings;
@ARGV = 'GUTINDEX-2004.txt' unless @ARGV;
my $file = shift or die "Usage: $0 GUTINDEX-2001.txt";
my @data;
{
open IN, '<', $file or die "Error: Can't open $file : $!";
my $start_parsing;
local $_;
while(<IN>){
next if /^\s*$/;
last if /^End\s+of\s+\Q$file\E/;
chomp;
if( $start_parsing ){
if(/^(\w+.*?) by (.+?)\s+(\d{5}.?)$/){
my( $title, $author, $id ) = ( $1, $2, $3 );
$data[$#data+1]->{id} = $id;
$data[$#data]->{title} = $title;
$data[$#data]->{author} = $author;
} else {
for( /\[(.+?)(\])?/g ){
my $foo = $1;
unless( $2 ){
while(<IN>){
if( /(.+?)\]/ ){
$foo .= $1;
last;
} else {
$foo .= <IN>
}
}
}
if( $foo =~ /\[?(\S+):(.+)\]?/s ){
$data[$#data]->{$1} = $2;
}
elsif( $foo =~ /,\s\d+$/s ){
$data[$#data]->{Date} = $foo;
}
}
}
}
# elsif( /^Title\s+and\s+Author/ ){
elsif( /^\Q~ ~ ~ ~ Posting Dates for the below eBooks\E/ ){
$start_parsing++;
}
}
}
use Data::Dumper;
print Dumper( \@data );
update:
What I noticed is that the first entry in GUTINDEX-2004.txt
doesn't have a closing
[ for
[Subtitle:,
which is a bug in the file.
| MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!" |
| I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README). |
| ** The third rule of perl club is a statement of fact: pod is sexy. |
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.