Re: Discovering minimal relational data types on large volumes

It does what I want (I think!) but runs unacceptably slowly on large volumes. I suspect that my regular expression is not helping.

Check out that suspicion before you start hacking on code.

If you're really getting data via DBI, particularly if you're going over the network, the overhead to get a row of data is very likely to dwarf any per-row computation you're doing. You can check that by measuring the time it takes to fetch that large volume of data without processing. Then compare that to the time you've measured for fetch+processing.

Comment on Re: Discovering minimal relational data types on large volumes

Replies are listed 'Best First'.
Re: Re: Discovering minimal relational data types on large volumes by dbush (Deacon) on Nov 20, 2002 at 22:15 UTC
Hi dws, You make a very fair point with respect to not optimising prior to understanding that the code works first and that the overhead associated with fetching the data from the database may well be significant. To check this out I have profiled the execution of my script. Here is a clip from the results of dprofpp: `> dprofpp Total Elapsed Time = -82.1369 Seconds User+System Time = 497.6071 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 35.9 179.0 175.93 105289 0.0002 0.0002 DBI::st::execute 33.9 168.9 137.38 105266 0.0000 0.0000 main::GetMinimalDataType snip...` [download] As you can see the majority of the time is spent in DBI::st::execute but a significant portion of the time is taken by the GetMinimalDataType function. Any savings here would be very useful. Many thanks for the feedback. Regards, Dom. Update: Please note that this profile is on a small test set of data and not on the real volume.	[reply] [d/l]

Replies are listed 'Best First'.

Re: Re: Discovering minimal relational data types on large volumes
by dbush (Deacon) on Nov 20, 2002 at 22:15 UTC

Hi dws,

You make a very fair point with respect to not optimising prior to understanding that the code works first and that the overhead associated with fetching the data from the database may well be significant. To check this out I have profiled the execution of my script. Here is a clip from the results of dprofpp:

> dprofpp
Total Elapsed Time = -82.1369 Seconds
  User+System Time = 497.6071 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 35.9   179.0 175.93 105289   0.0002 0.0002  DBI::st::execute
 33.9   168.9 137.38 105266   0.0000 0.0000  main::GetMinimalDataType
 snip...
[download]

As you can see the majority of the time is spent in DBI::st::execute but a significant portion of the time is taken by the GetMinimalDataType function. Any savings here would be very useful.

Many thanks for the feedback.

Regards,
Dom.

Update: Please note that this profile is on a small test set of data and not on the real volume.

[reply]
[d/l]