Parsing Script

phimtau123 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to create a script that would read the content of a text file and use that content to populate a database. Here is what i got so far...

#!/perl/bin/perl
use strict;
use File::Copy;    # Used to buffer files-to-send
use DBI;
use Switch;

#Main#

#Scan File#
my $scan_dir   = "C:\\Log";
my $error_file = "error_log";
my $filename;
my $data;
my @name;
my @info;
opendir DIR, $scan_dir;
open( ERROR, ">>$error_file" );

#  Search cache directory for csv files

while ( $filename = readdir(DIR) ) {

    if ( $filename =~ /\.txt$/i ) {
        $data = Read_File( $scan_dir, $filename );
    }
}

close(DIR);
close(ERROR);


#Read File#
sub Read_File {
    my ( $scan_dir, $file_name ) = @_;

    print "Read file $scan_dir\\$file_name\n";

        open (CVS, ">>Test.txt") or die "Cannot write CVS\n ";
    open( FILEREAD, "$scan_dir\\$file_name" ) or die "cannot open $fil
+e_name\n";

    my $Test    = <FILEREAD>;
    my @Content = split( / /, $Test );

    my ($EquipmentID, $Chamber, @LotID, $Recipe, $AlarmType, $Sensor, 
+$Signal_Level, $Alarm_Upper_Limit, $Alarm_Lower_Limit);  
    my ($UserCode, $Code, $Date_Submited, $Time_Submited);
    my $number;
    my (@array, @Comment);
    my $count;
    my $regex;
    

    for ( $number = 0 ; $number <= $#Content ; $number++ ) {

        @array= split(/=/,$Content[$number] );
        switch ($array[0])    {
            case (/EQPID/i)     {
                print "EQPID = $array[1]\n";
                $Chamber = substr ($array[1], -1, 1);
                switch ($Chamber){
                    case (/A/i)        {print "Chamber 1\n";}
                    case (/B/i)        {print "Chamber 2\n";}
                    case (/C/i)        {print "Chamber 3\n";}
                    case (/D/i)        {print "Chamber 4\n";}
                }
            }
            case (/AlarmType/i)    {print "AlarmType=$array[1]\n";}
            case (/LotID/i)        {
                print "$array[1]\n";
                @LotID = split( /-/, $array[1] );
                print "LotID = $LotID[0]\n";
                print "Wafer_Flow = $LotID[1]-$LotID[2]\n";
            }
            case (/Comment/i)    {
                @Comment = split( /,/, $array[1] );
            
                for ($count = 0; $count<=$#Comment; $count++)
                {
                    
                    if ($count==1) {                        
                        print "Sensor = $Comment[$count]\n";          
+              
                    }
                    elsif ($count==2) {
                        print "Signal_level = $Comment[$count]\n";
                    }
                    elsif ($count==3) {
                        print "AUL = $Comment[$count]\n";
                    }
                    elsif ($count==4) {
                        print "ALL = $Comment[$count]\n";
                    }
                    elsif ($count==5) {
                        print "Recipe = $Comment[$count]\n";
                    }
                    elsif ($count==6) {
                        print "Wafer_slot = $Comment[$count]\n";
                    }
                    elsif ($count==7) {
                        print "Step_Value = $Comment[$count]\n";
                    }                    
                }            
            }
            case (/Code/i)        {print "Code = $array[1]\n";}
            case (/UserCode/i)    {print "Code = $array[1]\n";}
                    
        }
    }

}
sub Scan_File {
    
    my $dbh = DBI->connect('DBI:ODBC:mwalarm')
      or die "Couldn't connect to database: " . DBI->errstr;
    my $sql =
    "INSERT INTO Alarm_Info(EquipmentID, Chamber, LotID, Wafer_Flow, R
+ecipe, Wafer Slot, AlarmType, Sensor, 
    Signal_level, Alarm_Upper_Limit, Alarm_Lower_Limit, UserCode, Code
+, Date_Submited, Time_Submited, False Positive) 
     VALUES     () ";
    my $sth = $dbh->prepare($sql)
      or die "Couldn't prepare statement: " . $dbh->errstr;

    $sth->execute() or die "cant execute" . $dbh->errstr;

    $sth->finish;
}
[download]

I'm trying to figure out how to populate the databse with content from the Read_File sub. And how can i integrate the Scan_File sub into my program so that instead of those print line in the Read_File sub it populate the databse.

Comment on Parsing Script Download Code

Replies are listed 'Best First'.
Re: Parsing Script by blazar (Canon) on May 24, 2007 at 19:26 UTC
I'm trying to figure out how to populate the databse with content from the Read_File sub. And how can i integrate the Scan_File sub into my program so that instead of those print line in the Read_File sub it populate the databse. Your code is not huge, but it is still too much for me to really want to read it in detail. At a cursory glance, however, I notice some issues: no warnings; unchecked open's and opendir's; lots of variables not declared as close as possible to their usage; unnecessary C-style `for` loops; a clumsy, long sequence of `elsif`'s; etc. However you have two orthogonal issues here: parsing a text file; push the gathered data into a database. Once you figure out how to do both, you can "link" them together: gathering data for putting it into the database won't be any different than doing so for printing it and putting that stuff into the database won't be terribly different than putting into it any other kind of data.	[reply] [d/l] [select]
Re^2: Parsing Script by phimtau123 (Novice) on May 24, 2007 at 19:45 UTC
pardon my clumsy code...this is my first script with perl, and i've only got 1 programming class like 3 year ago...so hehe i kindna suck at it...... but i'm really lost in how to push the data i got from the text file into the database.....can u direct me to a link where i can read up on it? also when u said unnecessary for loop, wat would be a better way of doing it? C-style is the only way i have been expose to. OR how else can i do the if-else statement to make it more efficient?	[reply]
Re^3: Parsing Script by blazar (Canon) on May 24, 2007 at 20:00 UTC
pardon my clumsy code...this is my first script with perl, and i've only got 1 programming class like 3 year ago...so hehe i kindna suck at it...... No problem, we're all here to learn. but i'm really lost in how to push the data i got from the text file into the database..... I'm not a database guru, but you use DBI yourself, so you should really read its docs. However I suppose you simply want some `INSERT`'s. OTOH you may also want to use some higher level tool, like Class::DBI or DBIx::Class... also when u said unnecessary for loop, wat would be a better way of doing it? C-style is the only way i have been expose to. Perl-style is documented in perldoc perlsyn and is almost always appropriate in Perl: `for (1..5) { Do::Something::with($_); }` [download] OR how else can i do the if-else statement to make it more efficient? Did I talk about "efficiency"? It's not a matter of efficiency. It's a matter of expressiveness. However, lotsa ways, also depending on the actual situation. For example hash lookup (possibly in the form of a lookup table) or, in your case and still at a quick glance, I would say even array lookup.	[reply] [d/l] [select]
Re: Parsing Script by FunkyMonk (Bishop) on May 24, 2007 at 21:17 UTC
A general maxim in programming is don't repeat yourself (usually abbreviated to DRY). I find this particularly relevant in Perl. Blazar has already commented on your use of unnecessary C-style for loops and a clumsy, long sequence of elsif's. You can replace `for ($count = 0; $count<=$#Comment; $count++) { #clumsy, long sequence of elsif's }` [download] with `my @fields = qw/ Sensor Signal_level AUL ALL Recipe Wafer_slot Step_Value /; die "I did not expect " . @Comment . " columns!" unless @fields == @Comment; for my $value ( @Comment ) { print shift @fields, " = $value\n"; }` [download]	[reply] [d/l] [select]
Re: Parsing Script by graff (Chancellor) on May 25, 2007 at 02:03 UTC
You didn't mention what sort of database you are connecting to, but if it's any sort of reasonably-well-endowed RDBMS (oracle, postgres, mysql, etc), then it comes with its own command-line tool for loading tab-, comma- or pipe-delimited data files directly into to the database (e.g. oracle's "sqlload", mysql's "mysqlimport", etc). And those vendor-supplied tools are orders of magnitude faster (and usually hard to beat in terms of data validation) when compared to executing a series of "insert" statements from a Perl/DBI script. Also, they are pretty flexible: you can choose what suits you best in terms of record and field delimiters, and how the db server should behave while loading the data. If the data files you are importing are in the hundreds (or maybe few thousands) of records, the speed difference might not count for much. But the fact that the tool already exists (and does excellent error trapping/handling) still makes it worthwhile. For data sets of many thousands of records, you could be looking at a run-time difference of 10 to 1 or worse for Perl/DBI inserts vs. the server's native import tool (i.e. if mysqlimport takes 6 minutes, Perl/DBI inserts will take at least an hour). It looks like you are opening a file for outputting cvs-style records, but you don't seem to ever write to that file handle. Let me suggest that you drop the idea of connecting to the database in this script, and just focus on writing a proper comma- (or better yet, tab-)delimited output file that puts the intended insertion data into a consistent set of plain-text rows. Then read up on your db-server's import tool and work out how to feed it the data file that was written by your perl script. You'll finish your task a lot quicker that way, because in addition to the actual db load being a lot faster, the perl script will be a lot simpler to code, and will go very fast on its own.	[reply]
Re: Parsing Script by princepawn (Parson) on May 24, 2007 at 19:47 UTC
I would recommend DBIx::Simple for database manipulation. I assume you have a properly normalized database schema? Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality	[reply]