in reply to Re: Get Node Value from irregular XML
in thread Get Node Value from irregular XML

@roboticus: Thanks so much for your help..I've tried this and it works. Didnt know we could specify multiple search paths. So, the search stops at the first occurence of "design". Only problem is, it may or may not be the right design. So I'll have to find out if I can specify any other co-occuring terms with it.

Hopefully,I wont run into those situations, but if I do, I'll be extracting the wrong value if I just went by design.

Regards, Madbee

  • Comment on Re^2: Get Node Value from irregular XML

Replies are listed 'Best First'.
Re^3: Get Node Value from irregular XML
by roboticus (Chancellor) on Jun 29, 2013 at 19:25 UTC

    madbee:

    If multiple searches could yield different results and you have an algorithm to determine which one is "better", then instead of stopping the search when you find the first one, call your function to score the result and stow it away. Then, once you do all the searches, you can choose the best one. Something like:

    #!/usr/bin/perl use strict; use warnings; use autodie; use XML::LibXML; use Data::Dumper; for my $FName (qw(1041480.1 1041480.2)) { print "----------- SEARCHING DOCUMENT $FName ---------\n"; my $dom = XML::LibXML->load_xml(location=>$FName); my @hits; for my $search ('/root/part/sect/header', '/root/para') { print "----- searching: $search\n"; my $nodeset = $dom->find($search); my $text = join('', map { $_->string_value } $nodeset->get_nod +elist); if ($text =~ /design/i) { # found a match! score it and store it my $score = goodness_evaluator($text); push @hits, [ $score, $text ]; } } if (@hits) { @hits = sort {$a->[0] <=> $b->[0]} @hits; my ($score, $text) = @{$hits[0]}; print "$FName: Best match: score=$score, text=$text\n"; } else { print "$FName: no matches found\n"; } } sub goodness_evaluator { my $t = shift; my $score = 0; $score += ord($_) for $t=~m/(.)/g; return $score; }

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.