Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Seaching for text in string and comparing to xml, if not found print text

by helpneeded (Initiate)
on Feb 14, 2013 at 05:57 UTC ( [id://1018689]=perlquestion: print w/replies, xml ) Need Help??

helpneeded has asked for the wisdom of the Perl Monks concerning the following question:

Hi there guys. I am not too new to perl, but I am struggling with this as I have limited time (2 hours left) to deadline. I have a flat file of 35mb it contains events that occured. the different areas are seperated by comma's. I have to search for an event id without stripping the string down. That I managed to do by doing this.
$file = 'Events.txt'; open $info, $file; while($line = <$info>) { @get_data = split ',' ,$line; @alert_ID = @get_data[5]; # No need to do this, but showing you + guys where my ID is located and how I find it.

So that is pretty straight forward. The thing now is, I have a 2gb XML file which will have similar alert_ids under tag <ALERT_ID>12345</ALERT_ID> I want to do a search for the @alert_ID in the XML file, if it finds it just say success, however if not found, it should return to the text file, copy the full @get_data string to another file. This way I can see which ID's were not successfully processed. It needs to do it for each ID and there are 900 odd thousand.

This is a 3 line extract of the Text 900 000 line file:

254368,1127,254368,PLMN-PLMN/BSC-396576/BCF-1411,2G_RB_Boardwalk_MN1_K +ZN,13201275,1,0,2,24-01-2013 00:00:04,24-01-2013 02:13:28,system,0,24 +-01-2013 23:56:06,cleanup,7706,1,55753,-1,-1,0,BTS O&M LINK FAILURE,2 +4-01-2013 00:00:04,24-01-2013 00:00:11,,0,,,,FF FF FF FF FF FF,0,1.01 +E+17,0,1.01E+17,1.01E+17,1.01E+17,0,0,24-JAN-13 09.56.10.484000000 PM +,0,396576 264616,1127,1127,PLMN-PLMN/BSC-396576/PCM-324,2G_Kwambonambi_KZN,13201 +274,1,0,1,24-01-2013 00:00:04,24-01-2013 02:16:57,system,0,24-01-2013 + 23:56:06,cleanup,2915,1,9760,-1,-1,0,FAULT RATE MONITORING,24-01-201 +3 00:00:04,24-01-2013 00:00:11,,0,,,ET 324d 00 ,,0,1.01E+17,0,1.01E+1 +7,1.01E+17,1.01E+17,0,0,24-JAN-13 09.56.10.488000000 PM,0,396576 276160,1130,1130,PLMN-PLMN/BSC-397139/PCM-304,2G_Kingscliffe_Smarket_M +TN_KZN,13201278,1,0,3,24-01-2013 00:00:11,24-01-2013 00:00:52,system, +0,24-01-2013 00:00:56,WITHCANCEL,2909,3,7206,-1,-1,0,AIS RECEIVED,24- +01-2013 00:00:11,24-01-2013 00:00:15,,0,,,ET 304d 00 ,,0,1.01E+17,0,1 +.01E+17,1.01E+17,1.01E+17,0,0,23-JAN-13 10.00.56.328000000 PM,0,39713 +9

here is an extract of the XML:

<MF_NOTIF_IND> <DOMAIN>NARANTC</DOMAIN> <RELEASE>R1</RELEASE> <OMC>158</OMC> <MOC> <MOCEntry value="PPTT"/> </MOC> <MOI> <RDN id="PLMN" value="PLMN"/> <RDN id="RNC" value="705"/> <RDN id="WBTS" value="38142"/> <RDN id="FTM" value="1"/> <RDN id="PPTT" value="1-1"/> </MOI> <EVENTTYPE>COMMUNICATIONS_ALARM</EVENTTYPE> <EVENTTIME>20130123235946</EVENTTIME> <EVENTINFO> <ProbableCause> <Value>INDETERMINATE</Value> </ProbableCause> <SpecificProblems> <SpecificProblemsItem> <Value>61152</Value> </SpecificProblemsItem> </SpecificProblems> <PerceivedSeverity>MAJOR</PerceivedSeverity> <NotificationIdentifier>21733559</NotificationIdentifier> <AdditionalText>RDI on unit 1 interface 1.</AdditionalText> <AdditionalInfo/> <UserAdditionalInfo/> <DiagnosticInfo>EMPTY</DiagnosticInfo> <ALARM_ID>21733559</ALARM_ID> <COMMENTS>CEN RAN</COMMENTS> <BACK_UP_OBJECT>3G_BE05_BES_CEN</BACK_UP_OBJECT> <BACKED_UP_STATUS>38142-Eiland_CEN</BACKED_UP_STATUS> <MONITORED_ATTRIBUTES/> <ALARM_LIST_ALIGNMENT_REQUIREMENT>EMPTY</ALARM_LIST_ALIGNMENT_REQU +IREMENT> <SERVICE_USER>EMPTY</SERVICE_USER> <SERVICE_PROVIDER>EMPTY</SERVICE_PROVIDER> <SECURITY_ALARM_DETECTOR>EMPTY</SECURITY_ALARM_DETECTOR> <STATE_CHANGE_DEFINITION>EMPTY</STATE_CHANGE_DEFINITION> <VENDOR_SPECIFIC_ALARM_TYPE>EMPTY</VENDOR_SPECIFIC_ALARM_TYPE> <ACK_TIME>EMPTY</ACK_TIME> <ACK_SYSTEM_ID>EMPTY</ACK_SYSTEM_ID> <ACK_USER_ID>EMPTY</ACK_USER_ID> <ACK_STATE>EMPTY</ACK_STATE> <THRESHOLD_INFO>24745</THRESHOLD_INFO> <TREND_INDICATION>EMPTY</TREND_INDICATION> <STATE_CHANGE_DEFINITION>EMPTY</STATE_CHANGE_DEFINITION> <PROPOSED_REPAIR_ACTIONS>EMPTY</PROPOSED_REPAIR_ACTIONS> <CORRELATED_NOTIFICATIONS>EMPTY</CORRELATED_NOTIFICATIONS> <REASON>EMPTY</REASON> <CLEAR_USER_ID>EMPTY</CLEAR_USER_ID> <CLEAR_SYSTEM_ID>EMPTY</CLEAR_SYSTEM_ID> <SYSTEM_DN>SubNetwork=Nokia-1,ManagementNode=OMC-1,IRPAgent=1</SYS +TEM_DN> </EVENTINFO> <USERLABEL>1-1</USERLABEL> <EventFeed>NARANTC</EventFeed> </MF_NOTIF_IND>

Please, some help would be much obliged!

Replies are listed 'Best First'.
Re: Seaching for text in string and comparing to xml, if not found print text
by choroba (Cardinal) on Feb 14, 2013 at 10:09 UTC
    Maybe a bit late, but one has to sleep sometimes. Here is the Proper Way™, i.e. using a pull-parser to process the huge XML:
    #!/usr/bin/perl use warnings; use strict; use XML::LibXML::Reader; open my $TXT, '<', '1.txt' or die $!; my %ids; while (<$TXT>) { undef $ids{ (split /,/)[5] }; } close $TXT; my $xml = XML::LibXML::Reader->new( location => '1.xml' ) or die "Cannot open xml\n"; while ($xml->nextElement('ALERT_ID')) { $xml->read; # Go to the text value. my $id = $xml->value; if (exists $ids{$id}) { delete $ids{$id}; } else { warn "Not found in csv: $id\n"; } } print "$_\n" for keys %ids;
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Seaching for text in string and comparing to xml, if not found print text
by cLive ;-) (Prior) on Feb 14, 2013 at 06:13 UTC

    Hack way, regex, proper way, parse the XML.

    if ($line =~ m|<ALERT_ID>(\d+)</ALERT_ID>|) { my $alert_id = $1; }

    I'll leave the XML solution to you + CPAN + searching...

Re: Seaching for text in string and comparing to xml, if not found print text
by tobyink (Canon) on Feb 14, 2013 at 08:16 UTC

    My general technique would be to start by extracting a list of all ALARM_ID strings from the XML file. Because of the size, assuming the XML is fairly predictable, it might be best to do this with a simple regular expression. Then use map to convert that list to a hash where the ALARM_ID strings are the keys and the values are all "1".

    Then loop through the text file, line by line. Extract the alarm ID from the line; if it's in the hash described in the previous paragraph, then print "Success\n" as output; otherwise, print the line as output.

    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
      Can you give me a short example? I am not too farmiliar with the map function, I have not yet used it.
Re: Seaching for text in string and comparing to xml, if not found print text
by ww (Archbishop) on Feb 14, 2013 at 20:21 UTC
    "I have limited time (2 hours left) to deadline"

    By me, that gets an almost-automatic   - -   for trying to set a time limit on the free help you're seeking. Suggest you start by reading PerlMonks FAQ.

    This is NOT code-a-matic. This is not a free code-writing service. This is a spot where those who offer help greatly appreciate some sign that you've made some effort yourself, and are interested in learning, as opposed to merely tapping their brains without compensation... to keep your job, grade-point average, or whatever is imposing your deadline.


    If you didn't program your executable by toggling in binary, it wasn't really programming!

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Seaching for text in string and comparing to xml, if not found print text
by karlgoethebier (Abbot) on Feb 14, 2013 at 14:03 UTC

    OK this is also to late.

    But i just discovered discovered this: XML::TreePuller

    I didn't use it yet but it looks good. And you don't need libxml2

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      I didn't use it yet but it looks good. And you don't need libxml2

      Except that you do, its basically Twig for LibXML, but with without the twig api

      I'd stick with twig

        Thanks. Didn't realize for unknown reason ;-) that it has stream/tree mode.

        Best regards

        «The Crux of the Biscuit is the Apostrophe»

Re: Seaching for text in string and comparing to xml, if not found print text
by nagalenoj (Friar) on Feb 14, 2013 at 07:30 UTC
    This could solve your purpose.
    my $file = 'Events.txt'; open $info, "<", $file or die "Couldn't open events file: $!"; open $xml, "<", 'XML_FILE.xml' or die "Couldn't open xml file: $!"; open $output, ">", 'OUTPUT.txt' or die "Coudn't open output file: $!"; my $found = 0; #Flag for indication while(my $line = <$info>) { @get_data = split ',' ,$line; $alert_ID = $get_data[5]; seek $xml, 0, 0; # Seek to the beginning, to scan from beginning f +or each ID. while(my $xml_line = <$xml>) { if ($xml_line =~ m|<ALERT_ID>$alert_ID</ALERT_ID>|) { $found = 1; last; } } if ( $found == 1 ) { print "$alert_ID Found.\n"; $found = 0; } else { print $output "@get_data"; } } close $info; close $xml; close $output;

    If you don't want anything else to be considered from the XML file. You can remove the other lines(either using sed or grep) before executing the script. It would reduce the processing time significantly.

      Argh, no! Don't do this! There's no need to re-read the whole XML file for every line of the text file (unless you expect the XML file to be constantly changing). Read it into a hash once at the start of the script.

      package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1018689]
Approved by davido
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-04-25 11:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found