in reply to Up for Critique
To answer your question - yes, there are a number of places you could optimize for speed (and maintainability!).
### FILL ACHROMS TABLE ########################################### # Take in whole records separated by five dashes instead of a # newline character. $/ = "-----"; while($record = <INFILE>) { # Print out a counter to show how far the filling has progressed. # print "$n\n"; # $n++; @newrecord = (); # Send the record to have any blank lines removed. @newrecord = CleanData( $record ); ----- ### FILL ACHROMS TABLE ########################################### # Take in whole records separated by five dashes instead of a # newline character. $/ = "-----"; while(my $record = <INFILE>) { # Print out a counter to show how far the filling has progressed. # print "$n\n"; # $n++; # Send the record to have any blank lines removed. my @newrecord = CleanData( $record );
There, $newrecord is a reference to an array. You'd get your stuff by using $newrecord->[5] instead of $newrecord[5]. If your arrays are large, this is a signficant savings.@newrecord = CleanData( $record ); ---- my $newrecord = CleanData($record);
That is much faster and makes much clearer what you're doing. The $/ is the newline separator for your system. It makes your script portable to Windows or Mac (for example).sub CleanData { my ($record, @cleanrecord, $line, @record); $record = $_[0]; ### Separate the record by newlines. @record = split /\n/, $record; ### Remove empty lines if they exist. foreach $line ( @record ) { chomp( $line ); next if ( $line =~ m/^\s*$/ ); push @cleanrecord, $line; } return ( @cleanrecord ); } ---- sub CleanData { my $record = shift; my @cleanrecord = grep { /\S/ } chomp(split($/, $record)); return \@cleanrecord; }
They're faster as well as more readable.$sth_update_exon -> execute( @exondata[1,2,3,4], $geneid, $exondata[0] );
Now, this assumes that all your stuff is going to be in the right order. If you can't, don't use an array - use a hash.sub ParseAChroms { my $achromref = shift; my $matches = join '|', ( 'BAC:', 'LENGTH:', 'OSOME:', 'CHR_START:', 'CHR_END:', 'BAC_START:', 'BAC_END:', 'ORIENTATION:', ); my @achrom; my $line = join '', @$achromref; push @achrom, $line =~ /\b([BY]AC\b/; push @achrom, $line =~ /(?:$matches)\s*(.*?)\s*/g; return \@achrom; }
Now, what's even cooler is that you can access stuff by name, not array index. This is important if you ever plan on changing your input file stuff, maybe adding a new thing you want to track?sub ParseAChroms { my $achromref = shift; my $matches = join '|', ( 'BAC:', 'LENGTH:', 'OSOME:', 'CHR_START:', 'CHR_END:', 'BAC_START:', 'BAC_END:', 'ORIENTATION:', ); my $line = join '', @$achromref; my %achrom = $line =~ /($matches)\s*(.*?)\s*/og; $achrom{TYPE} = $line =~ /\b([BY]AC\b/o; return \%achrom;
You could even make that regex ahead of time by using qr//, but I'll leave that as an exercise to the reader. And, yes, ParseGenes(), ParseExon(), and ParseSubExon() should be treated the same way.
my $match_achrom_statement = << STH_MATCH; SELECT ac_id FROM achroms WHERE ac_id = ? STH_MATCH my $sth_match_achrom = $dbh->prepare($match_achrom_statement);
------
We are the carpenters and bricklayers of the Information Age.
Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Re: Up for Critique
by biograd (Friar) on Mar 23, 2002 at 06:52 UTC |