comment on

As already noted, splitting based on whitespace is a faulty assumption in your algorithm, assuming company names have whitespace in them.

This, in my experience, is a common error for someone parsing a log for the first time so don't feel bad. :-) I prefer to parse logs based on predictable components. The more wild the potential format, the more complicated the code gets, but for a relatively simple format like the one you are suggesting, I think it's fairly straightforward (assuming you have a basic understanding of Regular Expressions).

You have to craft your Regular Expression to match the data you are expecting. A technique I have become fond of is the use of an ifstatement, which provides the additional feature of filtering out lines that don't match my preconceived format. I often capture those out to another file for occasional review to see if the parsing routine needs to compensate for previously unknown formats or conditions. I won't do that in this example so we can save space.

C:\Steve\Dev\PerlMonks\P-2013-10-27@0838-Log-Parse>type test1.log
GOOD Acme Toy Company 2010-01-01 2011-12-31
BAD XYZZY 1972-01-01 1972-06-18
UGLY Enron 2001-10-01 2011-09-11

C:\Steve\Dev\PerlMonks\P-2013-10-27@0838-Log-Parse>parselog.pl test1.l
+og
[download]

Status Company Name Start Date End Date

GOOD Acme Toy Company 2010-01-01 2011-12-31

BAD XYZZY 1972-01-01 1972-06-18

UGLY Enron 2001-10-01 2011-09-11

Status	Company Name	Start Date	End Date
GOOD	Acme Toy Company	2010-01-01	2011-12-31
BAD	XYZZY	1972-01-01	1972-06-18
UGLY	Enron	2001-10-01	2011-09-11

#!/usr/bin/perl

use strict;
use warnings;

# ---------------------------------------------------------------
# Parse log with following format:
# Status  Company Name Start Date End Date
#
# Assumptions:  Status contains no whitespace
#               Dates are in YYYY-MM-DD format
#               Company names have nothing that looks like a date
# ---------------------------------------------------------------

foreach my $inpfnm (@ARGV)
{
    if (!open INPFIL, '<', $inpfnm)
    {
        print "ERROR:  Cannot open input file '$inpfnm'\n";
    }
    else
    {
        print "<HTML>\n";
        print "<BODY>\n";
        print "<TABLE BORDER>\n";
        print "  <TR>\n";
        print "    <TH>Status</TH>\n";
        print "    <TH>Company Name</TH>\n";
        print "    <TH>Start Date</TH>\n";
        print "    <TH>End Date</TH>\n";
        print "  </TR>\n";
        while (my $inpbuf = <INPFIL>)
        {
            chomp $inpbuf;
            if ($inpbuf =~ /^(\w+)\s+(.+)\s+(\d{4}\-\d{2}\-\d{2})\s+(\
+d{4}\-\d{2}\-\d{2})\s*$/)
            {
                my $inpsts = $1;
                my $inpnam = $2;
                my $stadat = $3;
                my $enddat = $4;
                print "  <TR>\n";
                print "    <TD>$inpsts</TD>\n";
                print "    <TD>$inpnam</TD>\n";
                print "    <TD>$stadat</TD>\n";
                print "    <TD>$enddat</TD>\n";
                print "  </TR>\n";
            }
        }
        close INPFIL;
        print "</TABLE>\n";
        print "</BODY>\n";
        print "</HTML>\n";
    }
}

exit;

__END__
[download]

In reply to Re: Parsing Text from a File to HTML Table by marinersk
in thread Parsing Text from a File to HTML Table by anupchandu

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.