http://qs1969.pair.com?node_id=441078


in reply to Re^5: Muy Large File
in thread Muy Large File

Wow UK, you are da Monk! I hope you are a well paid professor or architect somewhere because you are obviously knowledgeable and helpful. It must have taken you quite some time to create your last response. Many many thanks. Your tome has already helped in the following manner.

Bigger is certainly not always better!
I cut the original test file in half to 4G for the purposes of this test. I then changed the buffer size from the original 2**30 to the below test sizes. As you can see below, going from 2**19 to 2**18 is pretty dramatic. In the range of 2**18 to 2**15, performance seems to be best.
Using 2**18, I went back and tested with the original 8G which now runs in an amazing 2m33s as opposed to the original 10m. Obviously 18 seems to be a good number for least amount of work on this particular server. I am starting to understand better all of the data buckets between the HD controller, IO bus, OS, RAM and the code. Very interesting indeed. Seeing the smaller buffer size work faster shatters the myths that I have held for many years. A sincere thanks to you and the others on this. For my part, I will evangelize this when the opportunity rises.

Regarding your thread code, I will be playing with it over the next few days and will post my findings when complete. To be honest, this will take some time for me to dissect and understand so my apologies if it seems delayed as I am sometimes a smacktard.

One question with the regular code. One of the requirements I have is to create a log that indicates which record the TR actually modified. Any ideas on how to do this whilst retaining the performance? It would seem that looping through BUFSIZE would make sense, except the fixed width records will not perfectly align with the buffer size in most cases.

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #24

real 4m8.87s
user 0m53.57s
sys 0m8.68s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #20

real 4m25.99s
user 0m53.58s
sys 0m7.56s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #19

real 3m46.35s
user 0m53.61s
sys 0m7.97s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #18

real 1m16.36s
user 0m41.76s
sys 0m32.58s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #18

real 1m16.45s
user 0m41.64s
sys 0m32.61s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #18 (8G)

real 2m33.92s
user 1m22.58s
sys 1m6.50s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #17

real 1m17.21s
user 0m41.64s
sys 0m32.98s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #16

real 1m18.92s
user 0m40.60s
sys 0m35.95s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #16

real 1m19.06s
user 0m41.74s
sys 0m34.87s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #15

real 1m20.50s
user 0m41.34s
sys 0m36.93s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #14

real 1m25.35s
user 0m41.45s
sys 0m41.11s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #13

real 1m33.98s
user 0m42.82s
sys 0m48.49s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #12

real 1m56.25s
user 0m47.20s
sys 1m6.11s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #11

real 2m25.52s
user 0m54.13s
sys 1m28.47s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #10

real 3m24.98s
user 1m4.65s
sys 2m16.84s

time /apps/p_dm200/ndm_ip_pull/tmp/test.pl #8

real 9m1.04s
user 2m9.46s
sys 6m44.87s