in reply to Re^2: Speeding up data lookups
in thread Speeding up data lookups

Are you allowed to show a little glimpse of what a shell and a holding file look like?

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Replies are listed 'Best First'.
Re^4: Speeding up data lookups
by suaveant (Parson) on Sep 22, 2005 at 18:56 UTC
    Well... holding certainly... I will put up a shell line with altered data... since it is financial data who knows what exactly I am allowed to show or not...

    Both fixed width, both sorted on security identifier (in ths case 31283GZY). These are the only two files used in a run.. everything is crammed into the shell as needed ahead of time.

    Holding sample:

    CUSIP 31283GZY HB31283GZY 200409162300 BDABS ABS US +D 16:37:05 S1

    Matching shell line:

    31283GZY6 FHLMC MV DATA COMB 56 324-s234 5.670 +20290201 20170301 9.000000 19830201 DFG + + C V1 + + + + + + + + + + + + + + + + + 20050231M 11 2002091100100010553400100000210435 +5200130000109986500000000026940000000000056960000000000004910000 + + + + + 20071121M 20080321000 +010102900099000000104331349000000104562599000000005014126000000005002 +242000000004190337

                    - Ant
                    - Some of my best work - (1 2 3)

      If I understand it correctly there is one huge shell file with lots of data each keyed by the (unique) security number.

      Next you have several holdings files, which per line/record indicate a security number and you have to extract some data from a record in the shell file as keyed by the security number in the current line of the holdings file. You then do some work on the returned record(s) and produce a report.

      Described as above it still looks to me as a typical database-centric problem.

      Given that you have a multi-processor computer, perhaps rather than thinking of having the processors run in parallel, i.e. multiple processors doing the same work on different data, what about putting them in a pipeline?

      Say you have one processor doing work on the database queries (that would be the database-engine/server) and another doing the parsing of the holdings files, querying the database-server for the relevant shell-records. This script will do no processing/parsing on the returned records, it will just collate them and somehow package this data, together with the necessary information from the holdings file and hand it over to another script (running on another processor) which will extract the info you need for the report from the "data" package and either writes the report or hands that job to a report-writing script on yet another processor.

      You will need some sort of queue to manage the handing over between the scripts/processors, but this will allow you probably to use multiple scripts/processors feeding from the queue.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law