formatting my html output

frasco has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: formatting my html output by Fletch (Bishop) on May 02, 2008 at 13:31 UTC
Well it's obvious the problem is on line 17. See How (Not) To Ask A Question. Also look at `[:upper:]` and `[:lower:]` in perlre which should be locale-aware given the proper setup. The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l] [select]
Re: formatting my html output by almut (Canon) on May 02, 2008 at 13:44 UTC
Could you post some sample data together with the code that you've tried? What you need to do should in principle also work with Unicode/UTF-8. For example, you can use `\p{Lu}` to match a character with the unicode property "Letter, uppercase" (for a detailed list see perlunicode, in particular section "Effects of Character Semantics"). Uppercasing and lowercasing should work as well... Update: to avoid unnecessary confusion, it's maybe worth mentioning that for a number of `\p{...}` expressions, there's the alternative `[[:...:]]` form. E.g. `[[:upper:]]` is the same as `\p{IsUpper}`. The `\p{...}` style is the more generic form, i.e. not all `\p{...}` expressions do have a `[[:...:]]` form. — BTW, the `"Is"`-prefix is optional, and you can use short or long forms. For example `\p{IsLu}` is equivalent to `\p{Lu}` or `\p{UpercaseLetter}` or `\p{IsUpercaseLetter}`.	[reply] [d/l] [select]
Re: formatting my html output by mr_mischief (Monsignor) on May 02, 2008 at 13:55 UTC
Do `[[:lower:]]` and `[[:upper:]]` not work on UTF8 text? I got the impression from perlre that they do. What I read in perlretut just now seems to reinforce this. Is there some problem in the docs? Maybe some code that's not working would help us find your problem so we can help you turn it into working code. Miss Cleo seems to be on vacation.	[reply] [d/l] [select]
Re: formatting my html output by frasco (Beadle) on May 02, 2008 at 14:04 UTC
sorry for my poor stile... but I'm new and confused! Anyway, by fetching data from db (making use of uncode modules) I have something like this: ʾà-da-um-=TÚG-:2 1 AKTUM-=TÚG . (Don't worry it is a death lenguage, but intellegible) Well, if I use a regex with a-z it of couse doesn't match the small ʾ (and the accented wovels as well).	[reply]
Re^2: formatting my html output by almut (Canon) on May 02, 2008 at 15:16 UTC
Here's is rough sketch of how you might go about doing it: `# your sample string my $orig = "\x{2be}\x{e0}-da-um-=T\x{da}G-:2 1 AKTUM-=T\x{da}G"; my $s = $orig; $s =~ s/(\p{Ll}+)/<i>$1<\/i>/g; # lower --> italic $s =~ s/(\p{Lu}+)/lc($1)/ge; # upper --> lower open my $fh, ">:utf8", "sample.html" or die $!; print $fh qq\|<html> <header> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> </header> <body> $orig<br /> $s </body> </html> \|; close $fh;` [download] Then load the sample.html in your browser; the second line should be the modified string. Except for the ʾ, it appears to work. I'm not sure what the ʾ (\x{2be}) is. It doesn't seem to be treated as a lowercase character (the Unicode database lists it among "spacing modifying letters")... I'm afraid you'll have to figure that one out yourself :)	[reply] [d/l]