Not perfect, but its a good start
use strict; use warnings; @ARGV = 'GUTINDEX-2004.txt' unless @ARGV; my $file = shift or die "Usage: $0 GUTINDEX-2001.txt"; my @data; { open IN, '<', $file or die "Error: Can't open $file : $!"; my $start_parsing; local $_; while(<IN>){ next if /^\s*$/; last if /^End\s+of\s+\Q$file\E/; chomp; if( $start_parsing ){ if(/^(\w+.*?) by (.+?)\s+(\d{5}.?)$/){ my( $title, $author, $id ) = ( $1, $2, $3 ); $data[$#data+1]->{id} = $id; $data[$#data]->{title} = $title; $data[$#data]->{author} = $author; } else { for( /\[(.+?)(\])?/g ){ my $foo = $1; unless( $2 ){ while(<IN>){ if( /(.+?)\]/ ){ $foo .= $1; last; } else { $foo .= <IN> } } } if( $foo =~ /\[?(\S+):(.+)\]?/s ){ $data[$#data]->{$1} = $2; } elsif( $foo =~ /,\s\d+$/s ){ $data[$#data]->{Date} = $foo; } } } } # elsif( /^Title\s+and\s+Author/ ){ elsif( /^\Q~ ~ ~ ~ Posting Dates for the below eBooks\E/ ){ $start_parsing++; } } } use Data::Dumper; print Dumper( \@data );
update:
What I noticed is that the first entry in GUTINDEX-2004.txt doesn't have a closing [ for [Subtitle:, which is a bug in the file.

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.


In reply to Re: Parsing Gutenberg Catalog Index by PodMaster
in thread Parsing Gutenberg Catalog Index by hacker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.