Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

regex & variable substitution

by kiz (Monk)
on Jul 11, 2005 at 09:20 UTC ( [id://473876]=perlquestion: print w/replies, xml ) Need Help??

kiz has asked for the wisdom of the Perl Monks concerning the following question:

Basically, I'm trying to write a snipette of code (to be included in a larger script) to do Unicode entitiy substitution. The basic idea is to search for any entities, then look them up in a hash (in pracice, created from an external file). If the entitiy is present, replace the entity with the unicode decimal code, otherwise leave the entity alone. Here is my current state of progress:
#!/usr/bin/perl # Unicode entity text => Unicode decimal number %lookup = ("Adieresis" => 196, "Aring" => 197, "Ccedilla" => 199, "Eacute" => 201, "Ntilde" => 209, "Odieresis" => 214 ); # A few lines of text to test. # Only elements 4 and 6 should match @source = ("fred", "adieresis", "Adieresis", "&adieresis;", "&Adieresis;", "", "fr&Adieresis;ed" ); foreach (@source) { # regexp: [^;]+? matches 1+ characters which are not a semi-colon # ([^;]+?) Remember it (in $1) # &([^;]+?);? basically matches a (pseudo) entity s/&([^;]+);?/"&#".eval(exists $lookup{\1} ? $lookup{\1} : \1).";"/e; print "($1) $_\n"; }
This outputs:
() fred () adieresis () Adieresis (adieresis) &#; (Adieresis) &#; (Adieresis) (Adieresis) fr&#;ed
which is not what I'm after.. it's finding matches, but not replacing it with anything the eval part, basically I'm sure that subsitution (either with s/// or tr///) should be able to do it, but I'm just not getting it.. :-( Anyone got anything that works? (am I barking at the wrong tree?) (should I even be barking here?) In hope....


-- Ian Stuart
A man depriving some poor village, somewhere, of a first-class idiot.

Replies are listed 'Best First'.
Re: regex & variable substitution
by holli (Abbot) on Jul 11, 2005 at 09:34 UTC
    s/&([^;]+);?/"&#".($lookup{$1} ? $lookup{$1} : $1).";"/e;


    holli, /regexed monk/
      Excellent - that's spot on! OK, so I now have a question: doesn't the plain
      $lookup{$1}
      create the key (and assign a value of null) if the key does not exist? Is it not better to do either "defined" or "exists" for the key? Anyway, my failing was the braces - getting that right made it work. many thanks...


      -- Ian Stuart
      A man depriving some poor village, somewhere, of a first-class idiot.
        No, the key will only autivify when you assign a value to it.

        Prove:
        #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %h; print $h{key} ? $h{key} : "nokey"; print "\n", Dumper (\%h);
        See? The hash stays emtpy.

        Update: exists, defined and the check for truth do different things. Consider:
        #!/usr/bin/perl use strict; use warnings; my %h = ( a => "", b => undef, c => "true"); for ( qw(a b c d) ) { print "key: $_ is " . ($h{$_} ? "" : "not ") . "true\n"; print "key: $_ is " . (defined $h{$_} ? "" : "not ") . "defined\n" +; print "key: $_ is " . (exists $h{$_} ? "" : "not ") . "existing\n +"; } #key: a is not true #key: a is defined #key: a is existing #key: b is not true #key: b is not defined #key: b is existing #key: c is true #key: c is defined #key: c is existing #key: d is not true #key: d is not defined #key: d is not existing


        holli, /regexed monk/
Re: regex & variable substitution
by neniro (Priest) on Jul 11, 2005 at 09:29 UTC
    s/&([^;]+);?/"&#".exists $lookup{$1} ? $lookup{$1} : $1.";"/e; should work the way you want it?!
      Hmmm... I'm sure I tried this too... definitely better that I was getting, but still not right: I get the value inserted if the key exists, but the match is not left in place if there is no key in the hash. Also, I'm not getting the bracketing text.. Still, I'm further that I was! :-)


      -- Ian Stuart
      A man depriving some poor village, somewhere, of a first-class idiot.
Re: regex & variable substitution
by kiz (Monk) on Jul 11, 2005 at 11:28 UTC
    Further investigation showed a flaw in the regexp. I think it should be:
    s/&([^;\s]+);/..
    ie, we avoid any words that start with an ampersand, but don't end in a semi-colon. The questionmarks, for non-greedy, become superfluos...


    -- Ian Stuart
    A man depriving some poor village, somewhere, of a first-class idiot.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://473876]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2024-04-18 23:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found