Fast morphing of SYBASE data to flat-file

singjoh has asked for the wisdom of the Perl Monks concerning the following question:

Our system uses SYBASE and SAS, and we want to take the output from a SYBASE query and load them up into a SAS dataset. Now there is a SAS/DB module that costs $n 000s, where n is a good deal more than we want to spend :o) Our 'fastest' method to date involves SYBASE isql to export a fixed (and delimited) file, a sed command to reformat into something SAS would like to consume, and then the SAS load-up. However, we think that Perl DBLIB would cut out some of the file i/o and would enable a quicker system. It doesn't - though I'm guessing this is inexperience rather than a fact ... so please help.

sub main {

        my $gdb = clsDatabase2->new(datasource => clsEnvironment2->new
+->datasources->{"DBSOURCE"});

        my $outfile = "temp.out";

        my $sql = " ... Some Query ..." ;

        my @rowrefs = $gdb->nsql($sql, "ARRAY") ;

                $gmod->log("Building new file.");
                open(FILE, ">$outfile") || die "$0: Cannot open file '
+$outfile' for writing: $!\n";
                foreach my $rowref (@rowrefs) {

                        my $field0 = sasify($rowref->[0]);
                        my $field1 = sasify($rowref->[1]);
                        my $field2 = sasify($rowref->[2]);
                        my $field3 = sasify($rowref->[3]);

                        my $line = sasify("$field0|$field1|$field2|$fi
+eld3");
                        print FILE "$line\n";

                }

        $runok = 1 ;
}

sub sasify() {
        my ($in) = @_;
        $in =~ s/^\| *//g; # remove end marker and trailing spaces (SY
+BASE)
        $in =~ s/ *\|$//g;
        $in =~ s/NULL//g; # destroys nulls (possible effect on names i
+ncluding NULL, shame :o)  )
        $in =~ s/\| */|/g; # left trim fields
        $in =~ s/ *\|/|/g; # right trim fields
        $in =~ s/^\|/.|/g; # Check first field (if empty then write to
+ .)
        $in =~ s/\|{2}/|.|/g;  # Write Empty fields to . (double pass)
        $in =~ s/\|{2}/|.|/g;
        $in =~ s/\|$/|./g; # Finally check the last field  (if its emp
+ty the write to .)
        return $in;
}
[download]

Suffice to say $gdb is just a SYBASE database handle.

Comment on Fast morphing of SYBASE data to flat-file Download Code

Replies are listed 'Best First'.
Re: Fast morphing of SYBASE data to flat-file by davis (Vicar) on Mar 13, 2003 at 16:57 UTC
Well, you're `sasify`'ing each of the fields individually, then `sasify`'ing them all again when you put them into the line (although this could be necessary, you appear to be stripping the delimiting pipe symbols for a reason). I'd do something like: `my $fields_string = join "\|", @$rowref; my $line = sasify($fields_string);` [download] That's one call to the subroutine rather than 5. Another comment: you're asking Perl to compile each of the regular expressions again every time the subroutine `sasify` is called. You can avoid this with the "/o" operator (stick an 'o' next to the 'g' at the end of each regex). This tells Perl to compile the regex only once for the entire course of the program. IIRC the tr operator - see `perldoc perlop` or perlop - is reported to be faster for delete operations. You may wish to try something like (untested; I'm not too hot on the tr operator): `$in =~ tr/^\\| //d;` [download] Hope that Helps davis Is this going out live? No, Homer, very few cartoons are broadcast live - it's a terrible strain on the animator's wrist Update:* Minor typographical changes	[reply] [d/l] [select]
Re (2): Fast morphing of SYBASE data to flat-file by VSarkiss (Monsignor) on Mar 13, 2003 at 17:40 UTC
you're asking Perl to compile each of the regular expressions again every time the subroutine sasify is called. You can avoid this with the "/o" operator This is a common misconception. In fact, the regular expressions are compiled once, at the same time as the rest of the program. The `/o` modifier makes a difference only when the RE includes a variable interpolation: in that case, it means the RE will contain the value of the variable at its first use, and will not change, regardless of whether the variable changes or not. In this case, since the REs are static strings, there is no re-compilation involved.	[reply] [d/l]
Re^3: Fast morphing of SYBASE data to flat-file (say "no" to /o) by tye (Sage) on Mar 13, 2003 at 19:06 UTC
I'd like to add to this, "Just don't use /o!" It was useful in Perl4. In Perl5 it is not very useful (the performance gain it provides is small and not as good as that provided by better alternatives) and it is confusing to the point of causing bugs. If you haven't identified a performance problem, then don't worry about it. If you have, then use qr// (if you have regular expression that contain variables). - tye	[reply]
Re: Fast morphing of SYBASE data to flat-file by VSarkiss (Monsignor) on Mar 13, 2003 at 17:53 UTC
It looks like you're using Sybase::DBlib or Sybase::CTlib. The `nsql` routine in those is very convenient, but it's not terribly efficient, because it pulls the entire result set into memory. You may be able to reduce your memory requirement quite a bit by processing a row at a time. Naturally, whether this is a win or not depends on the size of the result set (row width and number of rows). I'd also suggest consolidating your regular expressions, and where possible, shift some of the work to Sybase. For instance, it looks like you're trying to change NULL and empty columns in your result set to . (dot). Instead of using several regular expressions, you may able to just do something like: `coalesce(trim(colname),'.')` [download] in your SQL. (By the way, you can control whether the Sybase modules will map NULL values to `undef` or the string `NULL` with an attribute. I'm not working with Sybase ATM, so I don't have the docs handy.) HTH	[reply] [d/l] [select]
Re: Fast morphing of SYBASE data to flat-file by runrig (Abbot) on Mar 13, 2003 at 17:53 UTC
Whenever you start using variables like field0, field1, etc., it should be a red flag that you should probably be using arrays. The code could be more like this: `foreach my $rowref (@rowrefs) { my @sassified = map { sassify($_) } @$rowref; print FILE join("\|", @sassified),"\n"; }` [download] You could even do that in one line if you really wanted to: `print FILE join("\|", map { sassify($_) } @$_), "\n" for @rowrefs;` [download] And in your sassify function, if you have alot of s///'s to perform on a variable, its more perlish to put it in a for loop: `for ($in) { s/this/that/; s/here/there/; etc. }` [download] Disclaimer: Code above is not tested, and I'm sorry to say that none of this will help much with the speed.	[reply] [d/l] [select]
Re: Fast morphing of SYBASE data to flat-file by Abigail-II (Bishop) on Mar 13, 2003 at 23:31 UTC
I don't know SAS, so I can't comment on that part, but Sybase also as a BCP utility program (and associated library calls - I don't know whether DBI gives you access to them). If you have lots of data, you might want to use BCP to get the data out of Sybase. BCP extracts or load complete tables, IIRC, but you can always create a temporary table with your query, and BCP that out. Whether this will be faster, I cannot judge, but it's something you might want to consider. Abigail	[reply]
Re: Re: Fast morphing of SYBASE data to flat-file by mpeppler (Vicar) on Mar 17, 2003 at 16:44 UTC
And recent versions of Sybase will let you bcp from a view, which pretty much lets you extract any data you want from a table (or collection of tables) into a bcp file. Michael	[reply]