in reply to formatting my html output

sorry for my poor stile... but I'm new and confused! Anyway, by fetching data from db (making use of uncode modules) I have something like this:
 ʾà-da-um-=TÚG-:2 1 AKTUM-=TÚG 
. (Don't worry it is a death lenguage, but intellegible) Well, if I use a regex with a-z it of couse doesn't match the small ʾ (and the accented wovels as well).

Replies are listed 'Best First'.
Re^2: formatting my html output
by almut (Canon) on May 02, 2008 at 15:16 UTC

    Here's is rough sketch of how you might go about doing it:

    # your sample string my $orig = "\x{2be}\x{e0}-da-um-=T\x{da}G-:2 1 AKTUM-=T\x{da}G"; my $s = $orig; $s =~ s/(\p{Ll}+)/<i>$1<\/i>/g; # lower --> italic $s =~ s/(\p{Lu}+)/lc($1)/ge; # upper --> lower open my $fh, ">:utf8", "sample.html" or die $!; print $fh qq|<html> <header> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> </header> <body> $orig<br /> $s </body> </html> |; close $fh;

    Then load the sample.html in your browser; the second line should be the modified string.  Except for the ʾ, it appears to work. I'm not sure what the ʾ (\x{2be}) is. It doesn't seem to be treated as a lowercase character (the Unicode database lists it among "spacing modifying letters")... I'm afraid you'll have to figure that one out yourself :)