in reply to Re^5: String Search
in thread String Search

Hi Marshall, Thanks a lot for your king help. But I have already thouht that approach also. The logic I had thought as Find the data and insert the data into table But some problem will arise for that. 1. The database we are using as Oracle Database. 2. The table where I will insert the data is very big table and there will lacs of files where every file may be have 1000 records , so it will take hugh time to load into database. 3. Better approach I think SQL*Loader. Because perl is very useful to format the data into "," separated file and SQL*Loader always very fast to load data into table. Your Help is reqired here

Replies are listed 'Best First'.
Re^7: String Search
by Marshall (Canon) on Aug 31, 2009 at 14:14 UTC
    Great! We are making some progress!
    1. You do have a professional grade Database.
    2. It can definitely do SQL stuff.

    From your description of "lots of files with up to 1,000 records", I figure we are talking about a lot less than 1 million records.

    I don't understand the "problem". You are getting this stuff from an Oracle DB and it is going into another Oracle DB. Oracle has tools that can do this very efficiently. One thing to think about is that: what the heck, even if this takes 12 hours, you would already be done in the time its taken to discuss this on Monks!

    As far as generating a CSV, Comma Separated Value file goes, your Oracle DB can do that itself. The command for this report will probably be complex, but the DB can do it. But even more efficient will be the merging tools that Oracle provides, or so I suspect. And I think there are some very good reasons for suspecting that.

    At this point, I would ask any Monk who knows about Oracle to give advice. Perl may play a role in this DB merge, but I don't think that it will be the "star" player.

      Marshall,

      I got an idea for this. Please find the structure:

      1> we can declare structure in C for same structure as the data.

      2> Then create link list for the stucture.

      3> Map the data with the structure with the structure and allocate memory dynamically as each structure.

      4> Search the data and extract the data from leaf to root.

      Please help me whether I am right or wrong..

      Regards,

        You seem determined to use this text dump from DB and make a CSV file for import again. I still recommend other approaches, but here are some more thoughts for you:

        Your thinking apprears too be way too complex for the job at hand! You are making a "one off" thing. Usually the objective is to just get this one-off thing done and out of your hair. Think simple and take advantage of the details in this specific situation. Don't worry about "General purpose". I wouldn't worry about "elegant" or "fast" although simple approaches are often very fast. And to me, "straightforward" is its own kind of elegance!

        As far as creating a complex structure in either C or Perl, this appears to be "over kill". You are going towards a "flat" one line per record format. The variable names that you want are unique between "sections" (ie if you know the variable name, then you know what kind of sub-section it came from and the vars look like they can only appear once per call record). Take advantage of that! Your code doesn't appear to have any need to understand the multi-level nature of the input data.

        Nothing says that you can't do this is in multiple scripts or steps. This often is a good way as it eases the debug process. If code isn't "optimally efficient" don't worry about it! The idea is to set up a series of "filters" that progressively work towards your goal.

        So as a "first parsing step", I would do something like the code below. This makes a intermediate file that has all of the "var : value" things in each call record in a "flat" format. Fiddle with regex until you have what you need at this step.

        Then write code such that for each call record, you initialize a hash table with the default values for each var that will go into output line. Then for each var line in file's CDR record, if that name tag exists in hash, override with value from file. Then at end of record, print the CSV line. Record starts with something that matches CME20CP6.CallDataRecord and ends with blank line. Nothing is wrong with you adding a blank line manually to end of intermediate file to make the termination condition easy.

        Update: Just an example of how to implement the above strategy. @csv_order is the var names in order that they should appear in CSV. Now if you need say these "ChargingNumbers", I would make up a new name for that and "squish it" into one value in the intermediate file format, like you want it to appear in output CSV file. Anyway these 2 scripts will run in just a few seconds even for a million records.

        Hi,

        Can you give any help on this?

        Regards