comment on

People often tell me that Perl resembles line noise and they don't want to use it as a result. I've written a small script to analyze some SQL statements and I've written "line noise" and am soliciting advice on how to clean the code.

I have a sample of SQL from the MS-SQL profiler. I need to identify all unique "INSERT INTO Photo..." statements. My thought was to read the file, select lines that have the approriate sql and break them into a three part array with the middle element containing the field names. Then I push a reference to that array onto another array. Next, I sort that array on the field names and print unique SQL inserts (tracking them by stuffing the first instance of each combination of field names in a hash). Here's the code:

use warnings;
use strict;

my ( @results, %fields );
while(<>){
    s/\0//g;
    if (/(?<!from )INSERT INTO Photo/) {
        my @s = ( /([^(]+)(\([^)]+\))(.*)/ );
        push @results, \@s;
    }
}
my @final = sort{ $a->[1] cmp $b->[1]  } @results;
open OUT, ">results.txt" or die $!;
foreach( @final ) {
    my $fieldNames = $_->[1];
    if ( ! exists $fields{ $fieldNames } ) {
        $fields{ $fieldNames } = 1;
        print OUT @{ $_ }
    }
}
close OUT;
[download]

Here's some sample input (heavily edited because the SQL statements are HUGE):

exec new_int_id 'documents', 'did'
go
INSERT documents ( did, DocID, Template ) VALUES ( 3093, '000000000000
+000000003093', 'Photo' )
go
SELECT * FROM documents WHERE did=3093
go
INSERT INTO Photo (AccessLevelID, ANRNumber, ColorID, ContactSheetID) 
+VALUES (1, '225058', 1, 4)
go
exec new_int_id 'Photo', 'PhotoID'
go
exec new_str_id 'documents', 'DocId'
go
INSERT INTO Photo (QualityID, Remark, UniqueID, VolNum) VALUES (1, 'St
+. Nicolaasfeest -v. Moorsel', '211357', '1')
go
[download]

My code works fine, but it's a great example of the "line noise" problem. Unfortuanately, I really can't work my mind around the problem of cleaning the code to the point where it's comprehensible to someone with a low level of Perl knowledge. I could break the regex out with the /x modifier, but that's only a partial solution. Any advice would be appreciated.

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

In reply to Trying to avoid line noise (brain cramp) by Ovid

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.