in reply to Re^6: String Search
in thread String Search

Great! We are making some progress!
1. You do have a professional grade Database.
2. It can definitely do SQL stuff.

From your description of "lots of files with up to 1,000 records", I figure we are talking about a lot less than 1 million records.

I don't understand the "problem". You are getting this stuff from an Oracle DB and it is going into another Oracle DB. Oracle has tools that can do this very efficiently. One thing to think about is that: what the heck, even if this takes 12 hours, you would already be done in the time its taken to discuss this on Monks!

As far as generating a CSV, Comma Separated Value file goes, your Oracle DB can do that itself. The command for this report will probably be complex, but the DB can do it. But even more efficient will be the merging tools that Oracle provides, or so I suspect. And I think there are some very good reasons for suspecting that.

At this point, I would ask any Monk who knows about Oracle to give advice. Perl may play a role in this DB merge, but I don't think that it will be the "star" player.

Replies are listed 'Best First'.
Re^8: String Search
by kallol.chakra (Initiate) on Sep 01, 2009 at 06:17 UTC

    Marshall,

    I got an idea for this. Please find the structure:

    1> we can declare structure in C for same structure as the data.

    2> Then create link list for the stucture.

    3> Map the data with the structure with the structure and allocate memory dynamically as each structure.

    4> Search the data and extract the data from leaf to root.

    Please help me whether I am right or wrong..

    Regards,

      You seem determined to use this text dump from DB and make a CSV file for import again. I still recommend other approaches, but here are some more thoughts for you:

      Your thinking apprears too be way too complex for the job at hand! You are making a "one off" thing. Usually the objective is to just get this one-off thing done and out of your hair. Think simple and take advantage of the details in this specific situation. Don't worry about "General purpose". I wouldn't worry about "elegant" or "fast" although simple approaches are often very fast. And to me, "straightforward" is its own kind of elegance!

      As far as creating a complex structure in either C or Perl, this appears to be "over kill". You are going towards a "flat" one line per record format. The variable names that you want are unique between "sections" (ie if you know the variable name, then you know what kind of sub-section it came from and the vars look like they can only appear once per call record). Take advantage of that! Your code doesn't appear to have any need to understand the multi-level nature of the input data.

      Nothing says that you can't do this is in multiple scripts or steps. This often is a good way as it eases the debug process. If code isn't "optimally efficient" don't worry about it! The idea is to set up a series of "filters" that progressively work towards your goal.

      So as a "first parsing step", I would do something like the code below. This makes a intermediate file that has all of the "var : value" things in each call record in a "flat" format. Fiddle with regex until you have what you need at this step.

      Then write code such that for each call record, you initialize a hash table with the default values for each var that will go into output line. Then for each var line in file's CDR record, if that name tag exists in hash, override with value from file. Then at end of record, print the CSV line. Record starts with something that matches CME20CP6.CallDataRecord and ends with blank line. Nothing is wrong with you adding a blank line manually to end of intermediate file to make the termination condition easy.

      Update: Just an example of how to implement the above strategy. @csv_order is the var names in order that they should appear in CSV. Now if you need say these "ChargingNumbers", I would make up a new name for that and "squish it" into one value in the intermediate file format, like you want it to appear in output CSV file. Anyway these 2 scripts will run in just a few seconds even for a million records.

        Dear Marshall,

        Thanks a lot for you code. It is working fine. But it has been very complex to process.

        Can you please help me two things? 1. How Can I run a loop that will take the data between CME20CP6.CallDataRecord.uMTSGSMPLMNCallDataRecord to CME20CP6.CallDataRecord.uMTSGSMPLMNCallDataRecord without braces and the data will store in a array. Because after taking all the data I have to convert the data to decimal or binary according to the CDR Logic. 2. Which is the faster way to insert a row in a table using DBI.

      Hi,

      Can you give any help on this?

      Regards