Gary Yang has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I use XML::Simple to parse XML file and use JSON module to parse JSON file. Now, I need to parse text file (see below). I wonder if there is any existing Module works for the text file below. I did google search, but, did not find proper one. Any ideas? Below are three titles (books) I need to parse.
$VAR1 = '<HTML><BODY><PRE> Title: Perl and Cgi for the World Wide Web 2ND Edition Author: Castro, Elizabeth ISBN: 0201735687 ISBN13: 9780201735680 Binding: TRADE PAPER Class: USED Section: Computer Languages-Perl Price: 3.95 Location: 17 Item: 4 Condition: Standard Availability: 1 to 3 days Notes: Pub Date: 20100429 Title: Perl and Cgi for the World Wide Web 2ND Edition Author: Castro, Elizabeth ISBN: 0201735687 ISBN13: 9780201735680 Binding: TRADE PAPER Class: USED Section: General-General Price: 13.00 Location: 65 Item: 2 Condition: Standard Availability: 1 to 3 days Notes: Free ship: no Title: Perl and Cgi for the World Wide Web 2ND Edition Author: Castro, Elizabeth ISBN: 0201735687 ISBN13: 9780201735680 Binding: TRADE PAPER Class: USED Section: Computer Languages-Perl Price: 8.95 Location: 1 Item: 7 Condition: Standard Availability: 1 to 3 days Notes: Pub Date: 20100423 </PRE>';

Replies are listed 'Best First'.
Re: Perl Module to parse text file
by choroba (Cardinal) on Apr 10, 2012 at 08:13 UTC
    package Parse::Books; use strict; use warnings; sub parse { my $FH = shift; my @books; my $counter = -1; while (<$FH>) { chomp; next if /<.*>/; if (/^Title: *(.*)/) { $counter++; } if (/(.*): *(.*)/) { warn "Duplicate field $1 at # $counter line $.\n" if exists $books[$counter]{$1}; $books[$counter]{$1} = $2; } } return \@books; }
Re: Perl Module to parse text file
by CountZero (Bishop) on Apr 10, 2012 at 09:43 UTC
    If you can clean-up your data so the HTML tags disappear, then the following will work:
    use Modern::Perl; use Data::Dump qw/dump/; my @books; { local $/ = "\n\n"; while (<DATA>) { my $book = Book::parse($_); push @books, $book if $book->{Title}; } } say dump( \@books ); package Book; sub parse { return {map { my ( $key, $value ) = split /:\s*/ } split /\n/, shi +ft}; } package main; __DATA__ Title: Perl and Cgi for the World Wide Web 2ND Edition Author: Castro, Elizabeth ISBN: 0201735687 ISBN13: 9780201735680 Binding: TRADE PAPER Class: USED Section: Computer Languages-Perl Price: 3.95 Location: 17 Item: 4 Condition: Standard Availability: 1 to 3 days Notes: Pub Date: 20100429 Title: Perl and Cgi for the World Wide Web 2ND Edition Author: Castro, Elizabeth ISBN: 0201735687 ISBN13: 9780201735680 Binding: TRADE PAPER Class: USED Section: General-General Price: 13.00 Location: 65 Item: 2 Condition: Standard Availability: 1 to 3 days Notes: Free ship: no Title: Perl and Cgi for the World Wide Web 2ND Edition Author: Castro, Elizabeth ISBN: 0201735687 ISBN13: 9780201735680 Binding: TRADE PAPER Class: USED Section: Computer Languages-Perl Price: 8.95 Location: 1 Item: 7 Condition: Standard Availability: 1 to 3 days Notes: Pub Date: 20100423

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Perl Module to parse text file
by Anonymous Monk on Apr 10, 2012 at 07:18 UTC

    Sure, here it is

    package Gary::Yang; sub parse { my( $in ) = @_; my @junk; while(my $line = <$in> ){ push @junk, [ split ':', $line, 2 ]; } return @junk; }
      package Doodle; use Modern::Perl; sub bop { my( $in ) = @_; my @dark; my @records; while( <$in> ){ next if /<[^>]+>/; if( /^\s*$/ ){ push @records, [ @dark ] if @dark; undef @dark; } else { s/\s+$//; push @dark, [ split ':', $_, 2 ]; } } return @records; }