hemanth.damecharla has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am attempting to write a perfmon log parser for windows using ActiveState perl v5.8.8, using DBI since the final goal is to allow the users write a query and retrieve the data. For example something like:


perflogquery "SELECT AVG([\\server\Processor(_Total)\% User Time] FROM perfdata.csv WHERE [(PDH-CSV 4.0) (Central Standard Time) (300)] BETWEEN '1/20/2008 12:03:58 AM' AND '1/20/2008 01:27:42 AM"

The problem I am facing is with the column names. I instantiate a new db handle, connect to the csv file and bind(if that is the right word) the file to a table name. After this, I prepare the sql query and I have errors. After a little googling and perldoc reading, I found that due to the SQL standard, field names cannot contain special characters like a dot (.). So, all these tokens are translated to an underscore (_) when reading the first line of the CSV file, so all field names are `sanitized'. I also found that if we do not want this to happen, we need to set the raw_header to a true value. But, even setting the raw_header to 1 did not help.</p.

Could someone, please point me in the right direction on how to parse the query string automatically to reflect the sanitization that is done with the headings?


Below is a partial code block that I used for testing:

use strict; use DBI; my $pc_csv_query_string = shift or die "usage perflogquery <queryS +tring>"; my $pc_csv_file_name = "perfdata.csv"; my $pc_csv_db_handle; my $pc_query_stmt_handle; $@ = ""; eval{ #Connect to CSV file or database if you will. $pc_csv_db_handle = DBI->connect( "DBI:CSV:f_dir=.", { PrintError => 1, RaiseError => 1, AutoCommit => 0 } ); #Turn on tracing with trace_level 2 DBI->trace( 2, 'parse_csv_trace.log' ); #We will be working with individual files for now. Maybe i +n the #future, we can move to using multiple files. Here we are +binding #the perfmon table to a csv file. $pc_csv_db_handle -> {csv_tables} -> {perfmon} = { file => $pc_csv_file_name, }; #$pc_csv_db_handle -> {raw_header} = 1; not much use even +if we use raw_header. #Prepare statement for exection. $pc_query_stmt_handle = $pc_csv_db_handle -> prepare("$pc_csv_query_string +"); #Execute the statement $pc_query_stmt_handle -> execute(); # Retrieve the returned rows of data.Need to work on this. while ( my @row = $pc_query_stmt_handle->fetchrow_array() +) { print ("@row \n"); } #Clean up $pc_query_stmt_handle -> finish(); $pc_csv_db_handle -> di +sconnect(); }; $@ and die "SQL database error: $@";
Hope is a Heuristic Search.

Replies are listed 'Best First'.
Re: Perfmon log parser. Problem with column names having special chars.
by hemanth.damecharla (Initiate) on Jun 11, 2010 at 13:13 UTC

    I think, I have found a work around for this. After looking at here I found that SQL::Tokenizer was suitable for my case and used regex used by the Tokenizer module to sanitize my query. It needs a little working on but, for now works for most of my test cases.

    use strict; my $insane_sql = shift or die "usage sanitize_sql_query.pl <insane que +ry>"; print "\n\n$insane_sql\n\n"; my $sane_sql = ""; #Match anything inside single quotes. #Got it from SQL::Tokenizer. #Author:Igor Sutton Lopes while ($insane_sql =~ m/'.*?(?:(?:''){1,}'|(?<!['\\])'(?!')|\\'{2})/sm +xg) { my $pre_match = $`; my $match = $&; my $post_match = $'; print "Match: $match\n"; $match =~ s/'//g; #replace the single quotes $match =~ s/\W/_/g; #sanitize the match $insane_sql = $pre_match . $match . $post_match; } print "\n\n$insane_sql\n\n"; __END__ perl sanitize_sql_query.pl "SELECT TO_DATE(['(PDH-CSV 4.0) (Central St +andard Time)(360)']) AS CAPTUREDATE, AVG(TO_REAL(['\\Server\LogicalDi +sk(N:)\Avg. Disk sec/Read'])) AS AVG_LOG_SEC_READ FROM 'C:\PerfCSV\Se +rver_01200549.csv' GROUP BY TO_DATE(['(PDH-CSV 4.0) (Central Standard + Time)(360)'])" SELECT TO_DATE(['(PDH-CSV 4.0) (Central Standard Time)(360)']) AS CAPT +UREDATE, AVG(TO_REAL(['\\Server\LogicalDisk(N:)\Avg. Disk sec/Read']) +) AS AVG_LOG_SEC_READ FROM 'C:\PerfCSV\Server_01200549.csv' GROUP BY +TO_DATE(['(PDH-CSV 4.0) (Central Standard Time)(360)']) Match: '(PDH-CSV 4.0) (Central Standard Time)(360)' Match: '\\Server\LogicalDisk(N:)\Avg. Disk sec/Read' Match: 'C:\PerfCSV\Server_01200549.csv' Match: '(PDH-CSV 4.0) (Central Standard Time)(360)' SELECT TO_DATE([_PDH_CSV_4_0_Central_Standard_Time_360_]) AS CAPTUREDA +TE, AVG(TO_REAL([_Server_LogicalDisk_N_Avg_Disk_sec_Read])) AS AVG_LO +G_SEC_READFROM C_PerfCSV_Server_01200549_csv GROUP BY TO_DATE([_PDH_C +SV_4_0_Central_Standard_Time_360_])
    Edit: $match =~ s/\W+/_/g;, this should have actually been $match =~ s/\W/_/g; since we need all special characters to be converted to an underscore.
    Hope is a Heuristic Search.