in reply to Re^8: String Search
in thread String Search
Your thinking apprears too be way too complex for the job at hand! You are making a "one off" thing. Usually the objective is to just get this one-off thing done and out of your hair. Think simple and take advantage of the details in this specific situation. Don't worry about "General purpose". I wouldn't worry about "elegant" or "fast" although simple approaches are often very fast. And to me, "straightforward" is its own kind of elegance!
As far as creating a complex structure in either C or Perl, this appears to be "over kill". You are going towards a "flat" one line per record format. The variable names that you want are unique between "sections" (ie if you know the variable name, then you know what kind of sub-section it came from and the vars look like they can only appear once per call record). Take advantage of that! Your code doesn't appear to have any need to understand the multi-level nature of the input data.
Nothing says that you can't do this is in multiple scripts or steps. This often is a good way as it eases the debug process. If code isn't "optimally efficient" don't worry about it! The idea is to set up a series of "filters" that progressively work towards your goal.
So as a "first parsing step", I would do something like the code below. This makes a intermediate file that has all of the "var : value" things in each call record in a "flat" format. Fiddle with regex until you have what you need at this step.
Then write code such that for each call record, you initialize a hash table with the default values for each var that will go into output line. Then for each var line in file's CDR record, if that name tag exists in hash, override with value from file. Then at end of record, print the CSV line. Record starts with something that matches CME20CP6.CallDataRecord and ends with blank line. Nothing is wrong with you adding a blank line manually to end of intermediate file to make the termination condition easy.
#!/usr/bin/perl -w use strict; while (<DATA>) { print "\n$_" if m/CME20CP6.CallDataRecord/; next if /^\s*\[/; #skip stuff like [1] : '011351'H print "$1 : $2\n" if m/^\s*(\S+)\s+:\s+(\S+)\s*$/; } #Prints: #CME20CP6.CallDataRecord.uMTSGSMPLMNCallDataRecord #callIdentificationNumber : '6CBFD7'H #exchangeIdentity : "DWLCCN6" #gSMCallReferenceNumber : '9103770001'H #switchIdentity : '0001'H #recordSequenceNumber : '39D42E'H #date : '1409071F'H #serviceFeatureCode : '0002'H #timeForEvent : '131A01'H #CME20CP6.CallDataRecord.uMTSGSMPLMNCallDataRecord #callIdentificationNumber : '6CC99C'H #exchangeIdentity : "DWLCCN6" #switchIdentity : '0001'H #recordSequenceNumber : '39D42F'H #date : '1409071F'H #serviceFeatureCode : '0002'H #timeForEvent : '131A20'H #note fiddle with regex to suit you needs #change to say print "$1 : $2\n" if m/^\s*(\S+)\s+:\s+(.*)\s*$/; #if you want say #chargePartySingle : 'aPartyToBeCharged (0)' to appear __DATA__ CME20CP6.CallDataRecord.uMTSGSMPLMNCallDataRecord { sCFChargingOutput { callIdentificationNumber : '6CBFD7'H exchangeIdentity : "DWLCCN6" gSMCallReferenceNumber : '9103770001'H switchIdentity : '0001'H recordSequenceNumber : '39D42E'H date : '1409071F'H } eventModule { iNServiceDataEventModule { chargePartySingle : 'aPartyToBeCharged (0)' genericChargingDigits { [0] : '2000'H [1] : '011351'H [2] : '223A941400'H [3] : '233A940209'H [4] : '043A2000'H [5] : '0542'H [6] : '2600'H [7] : '2700'H [8] : '080290701391620122'H [9] : '2A02'H [10] : '72000000000000000000000000'H [11] : '730000000000000000041F'H [12] : '7400000000'H [13] : '3502'H } genericChargingNumbers { [0] : '0003136985138324'H [1] : '010413198935930920'H [2] : '0203136985138324'H [3] : '038290905893701402'H [4] : '0B000002000000'H } serviceFeatureCode : '0002'H timeForEvent : '131A01'H } } } CME20CP6.CallDataRecord.uMTSGSMPLMNCallDataRecord { sCFChargingOutput { callIdentificationNumber : '6CC99C'H exchangeIdentity : "DWLCCN6" switchIdentity : '0001'H recordSequenceNumber : '39D42F'H date : '1409071F'H } eventModule { iNServiceDataEventModule { chargePartySingle : 'bPartyToBeCharged (1)' genericChargingDigits { [0] : '2002'H [1] : '010359'H [2] : '023A8207'H [3] : '033A8207'H [4] : '043A0000'H [5] : '0506'H [6] : '2600'H [7] : '2704'H [8] : '080290701391622322'H [9] : '2A02'H [10] : '72000000000000000000000000'H [11] : '730000000000000000001F'H [12] : '3500'H } genericChargingNumbers { [0] : '0003138935167173'H [1] : '028210850000'H [2] : '0303138935167173'H [3] : '06041319'H } serviceFeatureCode : '0002'H timeForEvent : '131A20'H } } }
#!/usr/bin/perl -w use strict; my @csv_order = qw ( exchangeIdentity callIdentificationNumber); my %defaults = map {$_ => ""}@csv_order; my %curr_record=%defaults; while (<DATA>) { if (/CME20CP6.CallDataRecord/.../^\s*$/) { if ( my ($var,$val) = ($_ =~ m/^\s*(\S+)\s+:\s+(\S+)\s*$/) ) { $curr_record{$var}=$val if exists ($curr_record{$var}); } if (/^\s*$/) #remember to add a blank line at end of file { dump_csv_line(); %curr_record=%defaults; } } } sub dump_csv_line { print join (",",map{$curr_record{$_}}@csv_order)."\n"; } __END__ Prints: "DWLCCN6",'6CBFD7'H "DWLCCN6",'6CC99C'H ,'699999'H __DATA__ CME20CP6.CallDataRecord.uMTSGSMPLMNCallDataRecord callIdentificationNumber : '6CBFD7'H exchangeIdentity : "DWLCCN6" gSMCallReferenceNumber : '9103770001'H switchIdentity : '0001'H recordSequenceNumber : '39D42E'H date : '1409071F'H serviceFeatureCode : '0002'H timeForEvent : '131A01'H CME20CP6.CallDataRecord.uMTSGSMPLMNCallDataRecord callIdentificationNumber : '6CC99C'H exchangeIdentity : "DWLCCN6" switchIdentity : '0001'H recordSequenceNumber : '39D42F'H date : '1409071F'H serviceFeatureCode : '0002'H timeForEvent : '131A20'H CME20CP6.CallDataRecord.uMTSGSMPLMNCallDataRecord callIdentificationNumber : '699999'H
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^10: String Search
by kallol.chakra (Initiate) on Sep 11, 2009 at 08:39 UTC | |
by Marshall (Canon) on Sep 12, 2009 at 20:19 UTC |