pratikpooja has asked for the wisdom of the Perl Monks concerning the following question:

Hello Members, I am working on a perl program and I am new to it. I have to update XML (i.e. osm file) by taking values from CSV file. I have a csv file with values as

"latitude"; "longitude"; "name" 53.58464144;8.560693391;"AK2_PKW_A_001" 53.5777080210388;8.56104893106605;"AK2_PKW_A_002"
My XML file(i.e osm file) is
<?xml version='1.0' encoding='UTF-8'?> <osm version='0.6' upload='false' generator='JOSM'> <bounds minlat='53.560255' minlon='8.5117942' maxlat='53.5974042' ma +xlon='8.6024315' origin='CGImap 0.6.1 (11651 thorn-03.openstreetmap.o +rg)' /> <node id='-1643794' lat='53.58464144' lon='8.560693391'> <tag k='name' v='ASC_A_KEI_001' /> </node> <way id='-1653459' action='modify'> <nd ref='-1652714' /> <nd ref='-1652766' /> <nd ref='-1652768' /> <tag k='highway' v='service' /> <tag k='lanes:backward' v='1' /> <tag k='lanes:forward' v='1' /> <tag k='name' v='FW_W1_52' /> <tag k='vehicle' v='private' /> </way> </osm>
I have to update the name tag in XML file with the name coming from csv file by comparing the lat. and long. value from csv file with the lat. and long. value of XML file. I have written a code to get the values from XML. Can you please help me how to proceed further?. Thank you very much.
#!/usr/bin/perl use strict; use warnings 'all'; use feature 'say'; use Getopt::Long; use Getopt::Std; use Getopt::Long qw(GetOptions); use XML::LibXML; use Data::Dumper; my $filenameMap = 'test.osm'; my($dom) = XML::LibXML->load_xml(location => $filenameMap); my $doc = $dom->documentElement; my @children = $doc->childNodes; my $countChildren = @children; my @toUpdateNodes = (); foreach my $child (@children) { my @childFromNodes = $child->childNodes; my $countChildFromNodes = @childFromNodes; foreach my $childNode (@childFromNodes) { #print $childNode->nodeType, "\n" ; if($childNode->nodeType == XML_ELEMENT_NODE && $childNode->has +Attribute("k")) { my $nodeVal = $childNode->getAttribute("v"); print "NodeValue: ", $childNode->getAttribute("v"), "\n"; } } }

Replies are listed 'Best First'.
Re: Update XML Values using two primary keys
by choroba (Cardinal) on Jan 05, 2021 at 18:24 UTC
    As usually, TIMTOWTDI. For example, you can pre-hash the names from the CSV by the coordinates, then process the XML node by node picking the new name from the hash:
    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Text::CSV_XS; use XML::LibXML; my %name; my $csv = 'Text::CSV_XS'->new({allow_whitespace => 1, auto_diag => 1, binary => 1}); open my $in, '<:encoding(UTF-8)', 'test.csv' or die $!; $csv->header($in); while (my $row = $csv->getline($in)) { $name{$row->[0]}{$row->[1]} = $row->[2]; } my $dom = 'XML::LibXML'->load_xml(location => 'test.osm'); for my $node ($dom->findnodes('/osm/node')) { my $tag = $node->findnodes('tag[@k="name"]')->[0]; if (my $name = $name{ $node->{lat} }{ $node->{lon} }) { $tag->{v} = $name; } else { warn "No name found for $tag->{v}: @$node{qw{ lat lon }}\n"; } } $dom->toFile('new.osm');

    Or, you can load the XML into a DOM, then read the CSV line by line, searching the DOM for the corresponding node and renaming it:

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Text::CSV_XS; use XML::LibXML; my $dom = 'XML::LibXML'->load_xml(location => 'test.osm'); my $csv = 'Text::CSV_XS'->new({allow_whitespace => 1, auto_diag => 1, binary => 1}); open my $in, '<:encoding(UTF-8)', 'test.csv' or die $!; $csv->header($in); my $xpath_template = '/osm/node[@lat="%"][@lon="%"]'; while (my $row = $csv->getline($in)) { my $i = 0; my $xpath = $xpath_template =~ s/%/$row->[$i++]/gr; my $node = $dom->findnodes($xpath)->[0]; if ($node) { my $tag = $node->findnodes('tag[@k="name"]')->[0]; $tag->{v} = $row->[2]; } else { warn "@$row missing in XML\n"; } } $dom->toFile('new.osm');

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Thank you very much. I did apply the code and it is working as expected.

      What if there are some tag values which are in csv file and that needs to be added as new nodes in xml file with the maximum node id assigned. For e.g.-

      "latitude";"longitude";"name";"id" 59.58464;8.56069;A13_PRA_KEI;-1643794 59.58464;8.56069;A13_RAT_KEI;-1643795 59.59465;8.57070;A14_PQR_KEI;-1643796
      The first 2 rows have same lat and long value but the 2nd row value needs to be added in xml file as it has maximum node id value. Thank you.

        When you build the hash, you can store both values (name and id) and each time compare to the existing id (a hash of hashes). If the new id isn't larger then just discard that row. You can then delete items from the hash as you go through the XML and add any left in the hash at the end.


        🦛

Re: Update XML Values using two primary keys
by GrandFather (Saint) on Jan 06, 2021 at 00:36 UTC

    Do you need to deal with fuzzy matching for the location points (accept match if points are with 100m for example)?

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
      I don't know much about fuzzy matching but the idea was to just update the XML values with the CSV values by taking the lat and long as keys. Thank you for your support.
Re: Update XML Values using two primary keys
by karlgoethebier (Abbot) on Jan 06, 2021 at 11:00 UTC

    BTW, Getopt::Long should be enough. See also.

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: Update XML Values using two primary keys
by jcb (Parson) on Jan 06, 2021 at 03:24 UTC

    Other monks have given you help towards a solution, but I noticed a misunderstanding that will not help you: you do not have two primary keys, you have one primary key that happens to be a 2-tuple of (lat, lon) values. While you will have to match it piecewise, it is conceptually a single datum.

      In the context of database like storage they are indeed two primary keys even though they represent one point. That single point may be associated with multiple entries in the "database" even if it is a single point (assuming of course it represents a point on a two dimensional substantially unfolded surface).

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

        They could both be keys but I don't see how they could both be primary unless combined into a single primary key as jcb suggested. Do you happen to use or know of a database that supports multiple primary keys on the same table?


        🦛

      Thank you for the Information. Since I am new to Perl so I don't know whether the tuple datatype can be used but anyways thank you.

        There is no explicit tuple data type in Perl; you will need to either track both pieces in separate variables or use an array, but note that there are no builtin operators for comparing arrays elementwise: == on arrays compares their lengths (because it imposes scalar context on its operands) and == on array references checks for object identity (true if both references refer to the same object).

        However, while the 2-tuple nature of your primary key will not be directly reflected in your program, properly understanding it will help you to maintain clear thinking and accurate reasoning about your code.