Function to fix and adjust quotations and dashes to correct german typography in text for html-use. It set's quotations and dashes (Gedankenstrich).

The quotations can be set to the both forms used in german text. This type: „quote“/‚quote‘ and that type: »quote«/›quote‹. The dash (Gedankenstrich) is set to a Halbgeviertelstrich.

This function is useful for web-based cms systems. The user doesn't have to worry about a consistent typography and can use the fast to type but wrong style with: " ' -.

Comments and enhancements are welcome.

#!/usr/bin/perl use strict; use integer; use warnings; use HTML::Entities; # typoadjust # in: text, quotation-style (1 = german, other = french) # out: adjusted text sub typoadjust { $_ = encode_entities(decode_entities(shift)); # fix quotations if (shift == 1) { s/(^|\s)("|“|»|«)/$1„/g; s/("|«|»|”)($|[\s\-])/“$2/g; s/(^|\s)('|‘|‹|›)/$1‚/g; s/('|‹|›|’)($|[\s\-])/‘$2/g; } else { s/(^|\s)("|„|“|«)/$1»/g; s/("|“|”|»)($|[\s\-])/«$2/g; s/(^|\s)('|‘|‚|‹)/$1›/g; s/('|‘|’|›)($|[\s\-])/‹$2/g; } # fix dash s/(\s)-(\s)/$1–$2/g; return decode_entities($_); }

Replies are listed 'Best First'.
Re: Adjust German HTML Typography
by shenme (Priest) on Oct 02, 2004 at 00:12 UTC
    I wonder if this might be useful in the Lingua series of CPAN modules? Hmmm, it seems strange that there are only three Lingua-DE modules so far.
Re: Adjust German HTML Typography
by Beechbone (Friar) on Oct 07, 2004 at 23:06 UTC
    You might want to add:

    s/(\S)---?(\S)/$1 – $2/g;

    to catch english-style dashes, and maybe even:

    s/(\s)–\s+([^.]+?)\s+–(\s)/$1– $2 –$3/g;

    to prevent things like "Dann ging er -- so wie er war -- nach Hause." to break like this:

    Dann ging er --
    so wie er war
    -- nach Hause.

    Search, Ask, Know