THuG has asked for the wisdom of the Perl Monks concerning the following question:

I have proprietary XML that contains a "formula". An example of this:

*IF *VALUE ManagedSystem.Product *EQ NT *AND *VALUE ManagedSystem.Status *EQ '*OFFLINE'

I want to replace all of the words that begin with an asterisk with some readable text. Also, the AttributeGroup.Attribute names listed in the string should be replaced with their correct display text:

If the value of Product Code is equal to NT and the value of Status is equal to 'OFFLINE'

I can do this with regex, and this isn't too cumbersome with the asterisk words since there aren't that many of them. But the AttributeGroup.Attribute names passes two-thousand. We have scraped them out of the appropriate files and put them into a database, hoping we could use that as a lookup, but I'm not sure how.

It is possible a templating system could help us, but the XML is not something we generate, so we'd have to add template markup, and at that point, we might as well just do the work.

Any ideas for a not-so-hard way to do this?

  • Comment on Search and Replace with a Large Dictionary

Replies are listed 'Best First'.
Re: Search and Replace with a Large Dictionary
by RMGir (Prior) on Oct 08, 2008 at 21:18 UTC
    Well, you could try something fancy, but first I'd make sure that a "dumb" approach isn't fast enough.
    #!/usr/bin/perl -w use strict; my %fieldDescs=( 'ManagedSystem.Product', 'Product Code', 'ManagedSystem.Status', 'Status', # and many more, in actuality loaded from your database ); my %keywords=( EQ => 'is equal to', OFFLINE => "'OFFLINE'", IF => 'if', VALUE => 'the value of', # And your other keywords ); my $text="*IF *VALUE ManagedSystem.Product *EQ NT *AND *VALUE ManagedS +ystem.Status *EQ '*OFFLINE'"; # Substitute *FOO for the equivalent from %keywords, if it exists $text=~s/\*(\w+)\b/(exists $keywords{$1})?$keywords{$1}:$1/eg; # Substitute any word for its description from fieldDescs, if the word +s exists there $text=~s/\b([A-Za-z_\.0-9]+)\b/(exists $fieldDescs{$1})?$fieldDescs{$1 +}:$1/eg; print "New text is\n\t$text\n";
    It's not algorithmically very advanced, and it will be somewhat slow, but it's mostly proportional to the number of words in your formulas, not the words in your Attribute names... It's worth a try.

    Mike
Re: Search and Replace with a Large Dictionary
by mr_mischief (Monsignor) on Oct 08, 2008 at 22:58 UTC
    I hope you have the power to reconsider the approach. The first thing you might want to tell us is what sort of thing you want Tivoli EMS to do for you. Are you just wanting to document what the statements mean by translating the filters?

    A recursive-descent parser might be overkill here, but don't discount it completely yet. A naive left-to-right token match and replacement may actually work here, though. The language used is pretty simple.

    The left-to-right replacements would be simple with a hash. You could use a different top-level hash for each attribute group or you could use a hash of hashes. Two thousand entries is not that large. You could use YAML or XML for the specification if you didn't want to make constant hashes on disk. Just grab a token, see if it exists as a hash key, and replace it with its value. The code for this is simple and often seen (more often than it should be, as many times as the wheel has bene reinvented actually). The below is a simplified place to start, but you probably want to populate the full data structure in some other way for clarity.

    use strict; use warnings; my $string = q(*IF *VALUE ManagedSystem.Product *EQ NT *AND *VALUE Man +agedSystem.Status *EQ '*OFFLINE'); my %dict = ( 'ManagedSystem' => { 'Product' => 'Product Code', 'Status' => 'Status', }, '*IF' => 'If', '*VALUE' => 'the value of', '*EQ' => 'is equal to', '*AND' => 'and', q('*OFFLINE') => q('OFFLINE'), ); my @parts = split /\s+/, $string; foreach my $p ( @parts ) { if ( $p =~ m/\./ ) { my ( $attr_grp, $attr ) = split /\./, $p; $p = $dict{ $attr_grp }{ $attr } if exists $dict{ $attr_grp }{ $at +tr }; } else { $p = $dict{ $p } if exists $dict{ $p }; } print $p . ' '; } print "\n";

    If using a database, just make a table in the DB for each attribute group. Then make a char column for the attribute and make an index on it. Make another column for the replacement text. Then something simple like "select replacement from ManagedSystem where attribute = 'Product';" will get you the right replacement text. DBI, Rose::DBI, Class:DBI, or DBIx::Class would each be able to help you with that in its own way.

      Wow, October. I started this process way back in October and am just now getting a chance to try and tackle this problem. Just goes to show how low a priority documentation is.

      Thank you guys for your responses. I like the idea of a hash, I just couldn't visualize how to do it. These examples help.

      Mr Mischief, these are ITM Situations. We just want to generate documentation we can post to a wiki so users can quickly figure out what the cryptic page means. I have no problem walking the XML and generating the rest of the document. The problem was just how to make this single line easier to read.

Re: Search and Replace with a Large Dictionary
by kyle (Abbot) on Oct 08, 2008 at 21:12 UTC

    If your AttributeGroup.Attribute mapping is getting out of hand, you could use DBM::Deep instead of a straight hash. If you're committed to a database, you could still tie a hash to an object that will query the database as appropriate. Then those replacements as easy as

    s/((?:[A-Z][a-z]*)+\.(?:[A-Z][a-z]*)+)/$name_of{$1}/g;
Re: Search and Replace with a Large Dictionary
by gone2015 (Deacon) on Oct 08, 2008 at 21:10 UTC

    A hash will allow you to map one string to another straightforwardly enough. So, is the problem identifying the possible candidates ? If so, what is the nature of that problem ?