grizzley has asked for the wisdom of the Perl Monks concerning the following question:
We have minimized the script to following one:
If I comment out join or parsefile or even replace my $currentTh = Thread->new( \&thrsub ); with my $currentTh = Thread->new( { return 0 } ); error does not occur. What is wrong in this code?#!perl -l use XML::Twig; use threads; use Thread; $t= XML::Twig->new(twig_roots => {managedObject => \&handle_fasade}); $t->parsefile('inputFiles/wcel3g.xml'); sub handle_fasade{ my $currentTh = Thread->new( \&thrsub ); $currentTh->join; } sub thrsub{ }
Actually the longer I prepare this node the clearer it is to me that this approach is senseless. Is it even possible to parse XML in parallel? I would rather say XML parsing must be done in one thread and afterwards processing data can be done in parallel. Am I right?
update: fake input xml:
Simple line preparing huge fake input xml (replace number 100 with appropriate value):<?xml version="1.0" encoding="UTF-8"?> <blah version="2.1" xmlns="blah.xsd"> <someData type="actual" name="ActualConfiguration" id="1"> <header> <log dateTime="2012-05-08T10:10:10" action="export"/> </header> <managedObject class="NOKFLF:FLF" distName="MNE-PET/FLF-1000" id="6666 +666000000093362" timeStamp="2012-04-16T18:17:50" vendor="XXX" version +="S14"> <extension name="system_parameters"> <p name="$modifier">UNAUTHENTICATED</p> <p name="$state">operational</p> </extension> <list name="FLFOptions"> <p>0</p> <p>1</p> <p>2</p> <p>3</p> <p>4</p> <p>5</p> <p>7</p> <p>8</p> <p>10</p> <p>12</p> <p>13</p> <p>16</p> <p>17</p> <p>20</p> <p>24</p> <p>25</p> <p>29</p> <p>31</p> <p>32</p> <p>34</p> <p>35</p> <p>36</p> <p>37</p> <p>41</p> <p>42</p> <p>45</p> <p>46</p> <p>47</p> <p>48</p> <p>50</p> <p>51</p> <p>54</p> <p>56</p> <p>61</p> <p>62</p> <p>68</p> <p>69</p> <p>72</p> <p>73</p> <p>74</p> <p>88</p> <p>96</p> <p>107</p> <p>108</p> <p>109</p> <p>117</p> <p>118</p> <p>120</p> <p>123</p> </list> <p name="name1">31</p> <p name="name2">31</p> <p name="name">BRLE8</p> <p name="name4">25</p> <p name="name5">50</p> <p name="name6">10</p> <p name="name7">80</p> <p name="name8">20</p> <p name="name9">100</p> <p name="nameA">20</p> <p name="nameB">2</p> <p name="xyz">1</p> <p name="dbf">0</p> <p name="battery1">30</p> <p name="cpu2">150</p> <p name="FLFType">10</p> <p name="lower">40</p> <p name="upper">60</p> <p name="releaseLimit">4</p> <p name="delay">5</p> <p name="connection1">14</p> <p name="connection2">7</p> <p name="connection3">12</p> <p name="connection4">12</p> <p name="connection5">14</p> <p name="disableExt">0</p> <p name="disableInt">0</p> <p name="frPenalty">3</p> <p name="emerC">1</p> <p name="extraXLSNumber">6</p> <p name="extraBSW">64</p> <p name="RelPri">1</p> <p name="epHoUse">0</p> <p name="frTchim">30</p> <p name="freeDowngrade">95</p> <p name="freeUpgrade">4</p> <p name="freqMeas">30</p> <p name="xCalc">0</p> <p name="param1">4</p> <p name="param2">5</p> <p name="param3">0</p> <p name="param4">0</p> <p name="param5">30</p> <p name="param6">0</p> <p name="param7">10</p> <p name="param8">127</p> <p name="param9">1</p> <p name="param10">0</p> <p name="param20">255</p> <p name="param30">0</p> <p name="dparam1">150</p> <p name="dparam4">100</p> <p name="dparam6">186</p> <p name="dparam8">512</p> <p name="dparam10">30</p> <p name="cparam3">120</p> <p name="cparam5">50</p> <p name="cparam7">50</p> <p name="cparam9">384</p> <p name="cparam11">384</p> <p name="sparam1">21</p> <p name="sparam2">26</p> <p name="sparam3">30</p> <p name="sparam4">20</p> <p name="sparam5">25</p> <p name="sparam6">30</p> <p name="sparam7">24</p> <p name="sparam8">29</p> <p name="sparam9">120</p> <p name="sparam0">60</p> <p name="sparama">60</p> <p name="sparams">240</p> <p name="sparamd">4</p> <p name="sparamgf">1</p> <p name="sparamh">255</p> <p name="sparamh">10</p> <p name="sparamj">30</p> <p name="sparamk">3</p> <p name="sparami">18</p> <p name="sparamu">0</p> <p name="sparamy">8</p> <p name="sparamt">0</p> <p name="sparamr">1</p> <p name="sparamer">1</p> <p name="sparame">9</p> <p name="sparamw">7</p> <p name="somanyparams1">10</p> <p name="somanyparams2">90</p> <p name="somanyparams3">10</p> <p name="somanyparams4">70</p> <p name="somanyparams5">90</p> <p name="somanyparams5">20</p> <p name="somanyparams6">20</p> <p name="somanyparams7">1</p> <p name="somanyparams0">1</p> <p name="somanyparams8">1</p> <p name="somanyparams9">20</p> <p name="somanyparamsa">120</p> <p name="somanyparamss">120</p> <p name="somanyparamsd">1</p> <p name="somanyparamsf">400</p> <p name="somanyparamsg">100</p> <p name="somanyparamsh">200</p> <p name="somanyparamsj">25</p> <p name="somanyparamsk">1</p> <p name="somanyparamsl">66947</p> <p name="somanyparamso">66947</p> <p name="somanyparamsi">66947</p> <p name="somanyparamsu">8</p> <p name="somanyparamsy">0</p> <p name="somanyparamst">65535</p> <p name="somanyparamsr">5</p> <p name="anotherparam1">0</p> <p name="anotherparam2">5</p> <p name="anotherparam3">3</p> <p name="anotherparam4">1</p> <p name="anotherparam5">5</p> <p name="anotherparam6">3</p> <p name="anotherparam7">1</p> <p name="anotherparam8">3</p> <p name="anotherparam9">2</p> <p name="anotherparam0">4</p> <p name="anotherparamq">3</p> <p name="anotherparamw">12</p> <p name="anotherparame">6</p> <p name="anotherparamr">3</p> <p name="anotherparamt">6</p> <p name="anotherparamy">9</p> <p name="anotherparamu">12</p> <p name="anotherparami">20</p> <p name="anotherparamo">10</p> <p name="anotherparamp">5</p> <p name="anotherparama">20</p> <p name="anotherparams">10</p> <p name="anotherparamd">5</p> <p name="anotherparamf">20</p> <p name="anotherparamg">10</p> <p name="anotherparamh">5</p> <p name="anotherparamj">20</p> <p name="anotherparamk">10</p> <p name="anotherparaml">5</p> <p name="anotherparamz">20</p> <p name="anotherparamx">10</p> <p name="anotherparamc">5</p> <p name="anotherparqamb">30</p> <p name="anotherparqamn">0</p> <p name="anotherparqamm">0</p> <p name="anotherparqamas">4152</p> <p name="anotherparqams">0</p> <p name="anotherparqamd">15</p> <p name="anotherparqamf">1</p> <p name="anotherparqamg">10</p> <p name="anotherparqamh">5</p> <p name="anotherparqamj">128</p> <p name="anotherparqamk">0</p> <p name="anotherparqaml">127</p> </managedObject> </someData> </blah>
Script doing the job (to be optimized):perl -p0e "s/<managedObject.*?>.*<\/managedObject>/$& x 100/se" testin +small.xml > testin.xml
use XML::Twig; $inputFile = 'testin.xml'; $outputFile = 'testout.xml'; $loop = 563; $netType = "MNE-1v1"; $mx2G = "002"; $my2G = "02"; $objID = 1; $bID = $firstElementID = 1; $managedObjectsAmount = 0; $someID = 0; $segmentID = 0; $header = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE raml +SYSTEM 'blah.dtd'>\n<blah version=\"2.1\" xmlns=\"blah.xsd\">\n<someD +ata type=\"actual\" name=\"ActualConfiguration\" id=\"1\">\n<header>\ +n<log dateTime=\"2012-05-08T10:10:10\" action=\"export\"/>\n<log date +Time=\"2012-05-08T10:10:10\" action=\"ConfigurationHeaderBackup.id\"> +1</log>\n<log dateTime=\"2012-05-08T10:10:10\" action=\"Configuration +HeaderBackup.name\">ActualConfiguration</log>\n</header>\n"; $root = "<managedObject class=\"CommonStuff:ABCD\" version=\"1.0\" dis +tName=\"$netType\" id=\"12341234\" vendor=\"XXX\" timeStamp=\"2012-04 +-26T15:18:07\">\n<defaults name=\"System\" id=\"2\"/>\n<extension nam +e=\"system_parameters\">\n<p name=\"\$state\">operational</p>\n</exte +nsion>\n</managedObject>"; $ending = "\n</someData>\n</blah>"; my ($sec,$min,$hour,$day,$month,$yr19,@rest) = localtime(time); open(OUT, ">", $outputFile) or die "cannot open dataOut.txt: $!"; print OUT $header; print OUT $root; for $i(1 .. $loop) { $t= XML::Twig->new( twig_roots => { managedObject => \&handle_mana +gedObject}); $t->parsefile($inputFile); print "\nIteracja: $i / $loop \t-> OK\n"; $bID++; $someID = 0; } print OUT $ending; close (OUT); print "\n----------------\nObjects managed: $managedObjectsAmount \n\n +"; my ($sec2,$min2,$hour2,$day2,$month2,$yr192,@rest2) = localtime(time); printStartTime(); printEndTime(); sub handle_managedObject { my ($t, $element) = @_; @fields = split(/\//, $element->{'att'}->{'distName'}); # distName="MNE-PET/*" - OK if ($fields[0] ne $netType) { $fields[0] = $netType; } # distName="MNE-PET/FLF-1000..1064" - OK if ($fields[1] =~ /^FLF/) { $fields[1] = "FLF-".$bID; if (!$fields[2]) { $element->first_child('p[@name="name"]')->set_text($fields +[1]); } } # distName="MNE-PET/FLF-*/WTF-1..65" -> / FLF if ($fields[2] =~ /^WTF-\w+/) { $fields[2] = "WTF-".$someID; if (!$fields[3]) { $fields[2] = "WTF-".++$someID; $element->first_child('p[@name="name"]')->set_text($fields +[2]); } } # distName="MNE-PET/FLF-*/WTF-*/XLS-1..6" -> /WTF if (($fields[3] =~ /^XLS-\w+/) && (!$fields[4])) { @fieldsFLF = split(/-/, $fields[1]); @fieldsWTF = split(/-/, $fields[2]); @fieldsXLS = split(/-/, $fields[3]); $cId = $fieldsWTF[1].$fieldsXLS[1]; $element->first_child('p[@name="name"]')->set_text($fields[3]) +; $element->first_child('p[@name="cId"]')->set_text($cId); $element->first_child('p[@name="locAreaId1"]')->set_text($fiel +dsFLF[1]); $element->first_child('p[@name="locAreaId2"]')->set_text($mx2G +); $element->first_child('p[@name="locAreaId3"]')->set_text($my2G +); if ($fieldsXLS[1] == 1) { $element->first_child('p[@name="masterWTF"]')->set_text(1) +; $element->first_child('p[@name="segmentId"]')->set_text(++ +$segmentID); } else { $element->first_child('p[@name="masterWTF"]')->set_text(0) +; $element->first_child('p[@name="segmentId"]')->set_text($s +egmentID); } } $element->{'att'}->{'distName'} = join ('/',@fields); $element->{'att'}->{'id'} = $objID++; $element->set_pretty_print( 'indented'); $element->print(\*OUT) or die "Failed to write managedObject to ou +tput XML file:$!\n"; $managedObjectsAmount++; } sub printToFile { $element->set_pretty_print( 'indented'); $element->flush(\*OUT) or die "Failed to write element output XML +file:$!\n"; } sub printStartTime { print "START Time:\t".sprintf("%02d",$hour).":".sprintf("%02d",$mi +n).":".sprintf("%02d",$sec);###To print the current time print "\t$day-".++$month. "-".($yr19+1900)."\n"; ####To print date + format as expected } sub printEndTime { print "END Time:\t".sprintf("%02d",$hour2).":".sprintf("%02d",$min +2).":".sprintf("%02d",$sec2);###To print the current time print "\t$day2-".++$month2. "-".($yr192+1900)."\n"; ####To print d +ate format as expected }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: XML::Twig and threads
by BrowserUk (Patriarch) on Nov 26, 2012 at 12:58 UTC | |
by grizzley (Chaplain) on Nov 26, 2012 at 16:15 UTC | |
by BrowserUk (Patriarch) on Nov 26, 2012 at 16:36 UTC | |
by remiah (Hermit) on Nov 27, 2012 at 09:01 UTC | |
|
Re: XML::Twig and threads
by mirod (Canon) on Nov 26, 2012 at 12:52 UTC | |
by grizzley (Chaplain) on Nov 26, 2012 at 15:59 UTC | |
by BrowserUk (Patriarch) on Nov 26, 2012 at 16:11 UTC | |
by grizzley (Chaplain) on Nov 28, 2012 at 09:57 UTC | |
by BrowserUk (Patriarch) on Nov 28, 2012 at 13:44 UTC | |
| |
|
Re: XML::Twig and threads
by roboticus (Chancellor) on Nov 26, 2012 at 13:09 UTC | |
|
Re: XML::Twig and threads
by zentara (Cardinal) on Nov 26, 2012 at 12:46 UTC | |
by grizzley (Chaplain) on Nov 26, 2012 at 15:52 UTC | |
|
Re: XML::Twig and threads
by remiah (Hermit) on Nov 26, 2012 at 16:05 UTC | |
by grizzley (Chaplain) on Nov 28, 2012 at 10:04 UTC | |
by remiah (Hermit) on Nov 29, 2012 at 00:31 UTC | |
by BrowserUk (Patriarch) on Nov 29, 2012 at 00:36 UTC | |
by remiah (Hermit) on Nov 29, 2012 at 02:18 UTC | |
| |
|
Re: XML::Twig and threads
by Anonymous Monk on Nov 26, 2012 at 14:08 UTC | |
by BrowserUk (Patriarch) on Nov 26, 2012 at 14:19 UTC |