Re: formatting my html output

sorry for my poor stile... but I'm new and confused! Anyway, by fetching data from db (making use of uncode modules) I have something like this:

 ʾà-da-um-=TÚG-:2 1 AKTUM-=TÚG

. (Don't worry it is a death lenguage, but intellegible) Well, if I use a regex with a-z it of couse doesn't match the small ʾ (and the accented wovels as well).

Comment on Re: formatting my html output

Replies are listed 'Best First'.
Re^2: formatting my html output by almut (Canon) on May 02, 2008 at 15:16 UTC
Here's is rough sketch of how you might go about doing it: `# your sample string my $orig = "\x{2be}\x{e0}-da-um-=T\x{da}G-:2 1 AKTUM-=T\x{da}G"; my $s = $orig; $s =~ s/(\p{Ll}+)/<i>$1<\/i>/g; # lower --> italic $s =~ s/(\p{Lu}+)/lc($1)/ge; # upper --> lower open my $fh, ">:utf8", "sample.html" or die $!; print $fh qq\|<html> <header> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> </header> <body> $orig<br /> $s </body> </html> \|; close $fh;` [download] Then load the sample.html in your browser; the second line should be the modified string. Except for the ʾ, it appears to work. I'm not sure what the ʾ (\x{2be}) is. It doesn't seem to be treated as a lowercase character (the Unicode database lists it among "spacing modifying letters")... I'm afraid you'll have to figure that one out yourself :)	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: formatting my html output
by almut (Canon) on May 02, 2008 at 15:16 UTC

Here's is rough sketch of how you might go about doing it:

# your sample string
my $orig = "\x{2be}\x{e0}-da-um-=T\x{da}G-:2 1 AKTUM-=T\x{da}G";
my $s = $orig;

$s =~ s/(\p{Ll}+)/<i>$1<\/i>/g;  # lower --> italic
$s =~ s/(\p{Lu}+)/lc($1)/ge;     # upper --> lower

open my $fh, ">:utf8", "sample.html" or die $!;
print $fh qq|<html>
<header>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</header>
<body>
$orig<br />
$s
</body>
</html>
|;
close $fh;
[download]

Then load the sample.html in your browser; the second line should be the modified string. Except for the ʾ, it appears to work. I'm not sure what the ʾ (\x{2be}) is. It doesn't seem to be treated as a lowercase character (the Unicode database lists it among "spacing modifying letters")... I'm afraid you'll have to figure that one out yourself :)

[reply]
[d/l]