Perfmon log parser. Problem with column names having special chars.

hemanth.damecharla has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am attempting to write a perfmon log parser for windows using ActiveState perl v5.8.8, using DBI since the final goal is to allow the users write a query and retrieve the data. For example something like:

perflogquery "SELECT AVG([\\server\Processor(_Total)\% User Time] FROM perfdata.csv WHERE [(PDH-CSV 4.0) (Central Standard Time) (300)] BETWEEN '1/20/2008 12:03:58 AM' AND '1/20/2008 01:27:42 AM"

The problem I am facing is with the column names. I instantiate a new db handle, connect to the csv file and bind(if that is the right word) the file to a table name. After this, I prepare the sql query and I have errors. After a little googling and perldoc reading, I found that due to the SQL standard, field names cannot contain special characters like a dot (.). So, all these tokens are translated to an underscore (_) when reading the first line of the CSV file, so all field names are `sanitized'. I also found that if we do not want this to happen, we need to set the raw_header to a true value. But, even setting the raw_header to 1 did not help.</p.

Could someone, please point me in the right direction on how to parse the query string automatically to reflect the sanitization that is done with the headings?

Below is a partial code block that I used for testing:

use strict;
use DBI;

    my $pc_csv_query_string = shift or die "usage perflogquery <queryS
+tring>";    
    my $pc_csv_file_name = "perfdata.csv";
    
    my $pc_csv_db_handle;
    my $pc_query_stmt_handle;
    
    $@ = "";
    eval{
            #Connect to CSV file or database if you will.
            $pc_csv_db_handle =
                    DBI->connect( "DBI:CSV:f_dir=.",
                                    {
                                        PrintError => 1,
                                        RaiseError => 1,
                                        AutoCommit => 0
                                    }
                                );
            
            #Turn on tracing with trace_level 2
            DBI->trace( 2, 'parse_csv_trace.log' );
            
            #We will be working with individual files for now. Maybe i
+n the
            #future, we can move to using multiple files. Here we are 
+binding
            #the perfmon table to a csv file.
            $pc_csv_db_handle -> {csv_tables} -> {perfmon} =
                            { 
                                file => $pc_csv_file_name,
                            };
            #$pc_csv_db_handle -> {raw_header} = 1; not much use even 
+if we use raw_header.
            #Prepare statement for exection.
            $pc_query_stmt_handle =
                    $pc_csv_db_handle -> prepare("$pc_csv_query_string
+");
                    
            #Execute the statement
            $pc_query_stmt_handle -> execute();
            
            # Retrieve the returned rows of data.Need to work on this.
            while ( my @row = $pc_query_stmt_handle->fetchrow_array() 
+) {
                        print ("@row \n");
            }
            
            #Clean up
            $pc_query_stmt_handle -> finish(); $pc_csv_db_handle -> di
+sconnect();
        };
    $@ and die "SQL database error: $@";
[download]

Hope is a Heuristic Search.

Comment on Perfmon log parser. Problem with column names having special chars. Select or Download Code

Replies are listed 'Best First'.
Re: Perfmon log parser. Problem with column names having special chars. by hemanth.damecharla (Initiate) on Jun 11, 2010 at 13:13 UTC
I think, I have found a work around for this. After looking at here I found that SQL::Tokenizer was suitable for my case and used regex used by the Tokenizer module to sanitize my query. It needs a little working on but, for now works for most of my test cases. use strict; my $insane_sql = shift or die "usage sanitize_sql_query.pl <insane que +ry>"; print "\n\n$insane_sql\n\n"; my $sane_sql = ""; #Match anything inside single quotes. #Got it from SQL::Tokenizer. #Author:Igor Sutton Lopes while ($insane_sql =~ m/'.?(?:(?:''){1,}'\|(?<!['\\])'(?!')\|\\'{2})/sm +xg) { my $pre_match = $`; my $match = $&; my $post_match = $'; print "Match: $match\n"; $match =~ s/'//g; #replace the single quotes $match =~ s/\W/_/g; #sanitize the match $insane_sql = $pre_match . $match . $post_match; } print "\n\n$insane_sql\n\n"; __END__ perl sanitize_sql_query.pl "SELECT TO_DATE(['(PDH-CSV 4.0) (Central St +andard Time)(360)']) AS CAPTUREDATE, AVG(TO_REAL(['\\Server\LogicalDi +sk(N:)\Avg. Disk sec/Read'])) AS AVG_LOG_SEC_READ FROM 'C:\PerfCSV\Se +rver_01200549.csv' GROUP BY TO_DATE(['(PDH-CSV 4.0) (Central Standard + Time)(360)'])" SELECT TO_DATE(['(PDH-CSV 4.0) (Central Standard Time)(360)']) AS CAPT +UREDATE, AVG(TO_REAL(['\\Server\LogicalDisk(N:)\Avg. Disk sec/Read']) +) AS AVG_LOG_SEC_READ FROM 'C:\PerfCSV\Server_01200549.csv' GROUP BY +TO_DATE(['(PDH-CSV 4.0) (Central Standard Time)(360)']) Match: '(PDH-CSV 4.0) (Central Standard Time)(360)' Match: '\\Server\LogicalDisk(N:)\Avg. Disk sec/Read' Match: 'C:\PerfCSV\Server_01200549.csv' Match: '(PDH-CSV 4.0) (Central Standard Time)(360)' SELECT TO_DATE([_PDH_CSV_4_0_Central_Standard_Time_360_]) AS CAPTUREDA +TE, AVG(TO_REAL([_Server_LogicalDisk_N_Avg_Disk_sec_Read])) AS AVG_LO +G_SEC_READFROM C_PerfCSV_Server_01200549_csv GROUP BY TO_DATE([_PDH_C +SV_4_0_Central_Standard_Time_360_]) [download] Edit:* $match =~ s/\W+/_/g;, this should have actually been $match =~ s/\W/_/g; since we need all special characters to be converted to an underscore. Hope is a Heuristic Search.	[reply] [d/l]

Replies are listed 'Best First'.

Re: Perfmon log parser. Problem with column names having special chars.
by hemanth.damecharla (Initiate) on Jun 11, 2010 at 13:13 UTC

I think, I have found a work around for this. After looking at here I found that SQL::Tokenizer was suitable for my case and used regex used by the Tokenizer module to sanitize my query. It needs a little working on but, for now works for most of my test cases.

use strict;
my $insane_sql = shift or die "usage sanitize_sql_query.pl <insane que
+ry>";
print "\n\n$insane_sql\n\n";
my $sane_sql = "";

#Match anything inside single quotes. 
#Got it from SQL::Tokenizer.
#Author:Igor Sutton Lopes
while ($insane_sql =~ m/'.*?(?:(?:''){1,}'|(?<!['\\])'(?!')|\\'{2})/sm
+xg) 
{
    my $pre_match = $`;
    my $match = $&;
    my $post_match = $';
    print "Match: $match\n";
    $match =~ s/'//g; #replace the single quotes
    $match =~ s/\W/_/g; #sanitize the match
    $insane_sql = $pre_match . $match . $post_match;
}
print "\n\n$insane_sql\n\n";

__END__
perl sanitize_sql_query.pl "SELECT TO_DATE(['(PDH-CSV 4.0) (Central St
+andard Time)(360)']) AS CAPTUREDATE, AVG(TO_REAL(['\\Server\LogicalDi
+sk(N:)\Avg. Disk sec/Read'])) AS AVG_LOG_SEC_READ FROM 'C:\PerfCSV\Se
+rver_01200549.csv' GROUP BY TO_DATE(['(PDH-CSV 4.0) (Central Standard
+ Time)(360)'])"

SELECT TO_DATE(['(PDH-CSV 4.0) (Central Standard Time)(360)']) AS CAPT
+UREDATE, AVG(TO_REAL(['\\Server\LogicalDisk(N:)\Avg. Disk sec/Read'])
+) AS AVG_LOG_SEC_READ FROM 'C:\PerfCSV\Server_01200549.csv' GROUP BY 
+TO_DATE(['(PDH-CSV 4.0) (Central Standard Time)(360)'])

Match: '(PDH-CSV 4.0) (Central Standard Time)(360)'
Match: '\\Server\LogicalDisk(N:)\Avg. Disk sec/Read'
Match: 'C:\PerfCSV\Server_01200549.csv'
Match: '(PDH-CSV 4.0) (Central Standard Time)(360)'

SELECT TO_DATE([_PDH_CSV_4_0_Central_Standard_Time_360_]) AS CAPTUREDA
+TE, AVG(TO_REAL([_Server_LogicalDisk_N_Avg_Disk_sec_Read])) AS AVG_LO
+G_SEC_READFROM C_PerfCSV_Server_01200549_csv GROUP BY TO_DATE([_PDH_C
+SV_4_0_Central_Standard_Time_360_])
[download]

Edit:

Hope is a Heuristic Search.

[reply]
[d/l]