Re: Memory Leak with XS but not pure C

Replies are listed 'Best First'.
Re^2: Memory Leak with XS but not pure C by Marshall (Canon) on Mar 29, 2025 at 04:53 UTC
This looks like a weird situation. The ß is kind of funky "s" and there is no uppercase version of this single lowercase letter. It is normally translated to 2 upper case "S" symbols. straße => STRASSE. This creates a pronunciation exception in German. The "a" preceding the "s" is pronounced differently depending upon whether one or two consonants follow it. This is a weird thing, but the string gets longer when capitalized. I am not sure what uc() does. Anyway was thinking that this has something to do with more memory being allocated and perhaps lost.	[reply]
Re^3: Memory Leak with XS but not pure C by karlgoethebier (Abbot) on Mar 29, 2025 at 09:20 UTC
”… there is no uppercase version of this single lowercase letter…” `Unicode Character “ẞ” (U+1E9E) - Latin Capital Letter Sharp S` Since 6/24/2008 in ISO/IEC 10646 «The Crux of the Biscuit is the Apostrophe»	[reply] [d/l]
Re^4: Memory Leak with XS but not pure C by cavac (Prior) on Apr 01, 2025 at 12:36 UTC
This letter was created in 2008 and standardized in the german language in 2017, but usage is optional, with "SS" the standard for uppercasing ß. Which makes that new letter the only letter of the german language that is not available on standard german keyboards. Great. Just great. Another perfect use of my taxpayer money. And no, that special letter is currently not fully supported in my commercial software either, because the font i use for printing invoices on thermal paper doesn't support it.¹ Oh well, that's the german language. To misquote Kennedy: "We speak german, not because it is easy, but because it is hard. Because that challenge is one that we are forced to accept, one we are unable to postpone, and one we intend to fail at miserably." And we do. Only a fraction of native german speakers actually speak german. Most (including me) speak a dialect of german. Especially in Austria, my home. When people from different Austrians states meet, it is an awesome thing to listen in. Everybody speaks a completely different dialect, and somehow we mostly manage to understand one another. (If someone joins who has only learned german as a second language, they might be in for a truly baffling experience, though.). Sidenote: I have seen a few Austrian movies played on German TV stations with german subtitles (notably: "Hinterholz 8"), which was one of the funniest experiences ever. ¹ It's astonishingly hard to find a readable, modern fixed width font that looks good and can be scaled down well enough that you can print all required text on an invoice, when you only have 512 pixels in width, on paper thats only 80mm (3.1 inch) wide. And that is still readable if you scale the image down to 384 pixel (50mm / 1.9 inch) for printing on a mobile bluetooth printer. Searching for a font that can all that and that supports some special letter that nobody uses anyway is an excersice for another decade... PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP Also check out my sisters artwork and my weekly webcomics	[reply]
Re^5: Memory Leak with XS but not pure C by karlgoethebier (Abbot) on Apr 01, 2025 at 17:02 UTC
Re^6: Memory Leak with XS but not pure C by cavac (Prior) on Apr 03, 2025 at 13:01 UTC
Some notes below your chosen depth have not been shown here
(OT) Re^3: Memory Leak with XS but not pure C by afoken (Chancellor) on Mar 29, 2025 at 19:59 UTC
The ß is kind of funky "s" It is actually a ligature of s and z, or at least, it started as one. That also gave it its name, Eszett: s-z. It is way more obvious in Fraktur, where you have two different forms of the lower case s. The "short" s that looks familiar and is generally used at the end of syllables, and the long s that is generally used at the beginning or in the middle of a syllable. It looks more or less like an f without the horizontal line. For the sharp s (which is also an alternative name for the Eszett), the s was doubled, depending on time and font, either as two long s or two short s or a long and a short s. The combination of long s and short s was alternatively printed as long s and z, which were merged in a ligature. In the following years, ß and ss became slightly different, annoying generations of students. The 1996 orthography reform attempted to get rid of ß in many places. and there is no uppercase version of this single lowercase letter. There are reasons: The upper case s was always S, for both long s and short s. The sharp s, written as ss (two longs, two shorts, or one long and one short) would always be written in upper case as SS. No extra rules or letters needed. The alternative form sz, printed as ligature of long s and z, would be written in upper case as SZ. Again, no extra rules or letters needed. But then, people started to treat the s-z ligature as a new and unique letter and forgot that it was a ligature. That caused the "strange" rule of "converting" ß to SS when converting to upper case, except where misunderstandings may happen, in that case, ß should be "converted" to what it represents, SZ. That rule is rarely used, most times, context is sufficient. Maße (measurements) and Masse (mass) are a classic example, both can be written as MASSE, but if misunderstandings may happen, Maße must be written as MASZE. At this point, rules for converting to upper case become really hard for computers. And so, ß was finally treated as a regular letter instead of a ligature and got its own dedicated upper case form (see Re^3: Memory Leak with XS but not pure C). The allocation in Unicode is a little but far away from ß and the other glyphs used in German, keyboard support sucks (Shift-ß gives ?, not the upper case ß), but at least, there is an upper case ß, now that the new orthography tried to eliminate it. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re^4: Memory Leak with XS but not pure C by karlgoethebier (Abbot) on Apr 01, 2025 at 10:19 UTC
Die Verlegenheit hat Erik Spiekermann beschrieben, der Grandseigneur unter den deutschen Schriftgestaltern: „Ich mag die Idee eines großen ß, aber ich habe noch nirgendwo eine überzeugende Form gesehen.“ Das dürfte der Grund dafür sein, dass bislang nur sehr wenige der gängigen Schriftarten überhaupt über einen ß-Großbuchstaben verfügen. In der Regel sitzt man nämlich vor seiner Tastatur, tippt Shift, AltGr und ß und sieht: nichts. Man erinnere sich dann an die letzte Scrabble-Partie und an Friedrich Forssman: „Tiefes Lesen geht nur, wenn der Text unsichtbar ist.“ Erik Spiekermann, the grand seigneur among German typeface designers, has described the embarrassment: "I like the idea of a capital ß, but I haven't seen a convincing form anywhere." This is probably the reason why only very few of the current fonts have a capital ß at all. As a rule, you sit in front of your keyboard, type Shift, AltGr and ß and see: nothing. Remember the last game of Scrabble and Friedrich Forssman: "Deep reading is only possible if the text is invisible." Buchstabe ẞ: Formprobleme der deutschen Sprache Erik Spiekermann «The Crux of the Biscuit is the Apostrophe»	[reply]
Re^3: Memory Leak with XS but not pure C by FrankFooty (Novice) on Mar 29, 2025 at 13:03 UTC
Yes Marshall, German is a great language eh?	[reply]
(OT) Re^4: Memory Leak with XS but not pure C by afoken (Chancellor) on Mar 29, 2025 at 21:01 UTC
German is a great language eh? Sure it is. There are so many crazy rules to learn that it is only beaten by the complete mismatch of language and spelling in English, the ridiculous amount of completely silent extra letters at the end of French words, and the number of inflection rules in Latin. It's so hard that even native speakers can have a hard time using it properly. Examples? Refer to a young girl as "Mädchen". That's the diminutive form of "Magd" (maid), but that is generally long forgotten. You can still see it by the "-chen" suffix. Because it is a diminutive, the grammatical gender changes from feminine to neuter. That's just grammatical, no implications about biology, social, cultural gender. And so, if you want to refer to that "Mädchen" in the next sentence, you must use the neuter pronoun "es", not the feminine pronoun "sie". If you use "sie", you are doing it wrong. That error is quite common, even for native speakers, even for professional speakers (like the presenters of the Tagesschau). Comparing criteria. If the amount is the same, you use "wie": "A hat genau so viele Äpfel wie B". If the amount is less or more, you use "als": "A hat mehr Orangen als B", "B hat weniger Orangen als A". Same as in English: "A has as much apples as B", "A has more oranges than B", "B has less oranges than A". But getting "wie" and "als" right is hard, because of regional differences. Many native speakers can't get their head around using "als" when comparing. They always use "wie", and that error more and more also happens to professional speakers. Grinding vs. looping. Grinding, reducing the thickness of some material by abrasive tools, ist "schleifen", past tense form "geschliffen". Grinding to cut material, intentionally or not, is "durchschleifen", past tense "durchgeschliffen" (strong inflection). Tie a cable across a rough, spinning wheel, and some sand and water, and the cable will be cut through in no time. The cable is "durchgeschliffen". A loop is "Schleife". Forming a loop, especially when handling electrical signals, e.g. into one device, then out of that device and into the next device in a chain, is "durchschleifen". Same letters and same sound as the grinding process, but a completely different base and a completely different meaning. Past tense is "durchgeschleift" (weak inflection). You can still see the "Schleife" in that word. Professionals started to intentionally use the wrong conjugation "durchgeschliffen" for fun, and many other people picked up that wrong form, not even knowing about the loop. Professional speakers rarely get that one wrong. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re^5: Memory Leak with XS but not pure C by karlgoethebier (Abbot) on Mar 30, 2025 at 14:49 UTC
Re^6: Memory Leak with XS but not pure C by choroba (Cardinal) on Mar 30, 2025 at 16:08 UTC
Re^2: Memory Leak with XS but not pure C by FrankFooty (Novice) on Mar 29, 2025 at 12:57 UTC
Thanks Nerdvana for your fast and very helpful reply! Using newSVpvn in place of newSVpv does indeed solve the complaint from Valgrind. Also you were of course correct about the extraneous u8_strlen I'm the epitomy of confusedness regarding Perl and unicode. Your tip to add "use utf8" was very useful as I'm passing the literal strings, and I added a check to the XS code prior to getting the string from the SV: `if(!SvUTF8(sv)) { sv = sv_mortalcopy(sv); sv_utf8_upgrade(sv); } s = SvPVutf8(sv, len)` [download] As Marshall says below, the esszett is a strange character (well, it is German) as it uppercases to 'SS'. This happens with many characters of other languages too. The standard Perl uc just leaves it there when uppercasing. The libunisting library (not mine!) does it correctly. Thanks again for the help!	[reply] [d/l]
Re^3: Memory Leak with XS but not pure C by hippo (Archbishop) on Mar 29, 2025 at 14:25 UTC
The standard Perl uc just leaves it there when uppercasing. Any currently supported perl should uppercase it correctly: $ cat uct.pl #!/usr/bin/env perl use strict; use warnings; use utf8; my $string = 'straße'; my $ucstring = uc $string; print "Uppercase $string is $ucstring\n"; $ perl uct.pl Uppercase straße is STRASSE $ Are you running an old version? 🦛	[reply]
Re^4: Memory Leak with XS but not pure C by ikegami (Patriarch) on Mar 29, 2025 at 17:15 UTC
`uc` suffers from The Unicode Bug when the `unicode_strings` feature isn't in enabled. It works correctly (giving `SS` for `ß`) when the string the `unicode_strings` feature is enabled. It works correctly (giving `SS` for `ß`) when the string is stored in the UTF8=1 format. It works incorrectly (`ß` unchanged) otherwise. `use open ":std", ":locale"; use feature qw( say ); my $ss = "\xDF"; utf8::upgrade( my $ss_u = $ss ); utf8::downgrade( my $ss_d = $ss ); { no feature qw( unicode_strings ); say uc( $ss_d ); # ß say uc( $ss_u ); # SS } { use feature qw( unicode_strings ); say uc( $ss_d ); # SS say uc( $ss_u ); # SS }` [download]	[reply] [d/l] [select]
Re^5: Memory Leak with XS but not pure C by FrankFooty (Novice) on Mar 31, 2025 at 10:21 UTC
Re^6: Memory Leak with XS but not pure C by syphilis (Archbishop) on Mar 31, 2025 at 12:29 UTC
Re^6: Memory Leak with XS but not pure C by ikegami (Patriarch) on Mar 31, 2025 at 13:20 UTC
Re^6: Memory Leak with XS but not pure C by ikegami (Patriarch) on Mar 31, 2025 at 13:18 UTC
Re^3: Memory Leak with XS but not pure C by jo37 (Curate) on Mar 29, 2025 at 14:42 UTC
the esszett is a strange character (well, it is German) as it uppercases to 'SS'. This happens with many characters of other languages too. The standard Perl `uc` just leaves it there when uppercasing. I suspect you use uppercase for case-insensitive comparison - which is not correct. foldcase would be the way to go as `fc "ß"` is indeed `"ss"`. Otherwise, maybe `uc fc` produces your desired result? Greetings, 🐻 `$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$`	[reply] [d/l] [select]
Re^3: Memory Leak with XS but not pure C by NERDVANA (Priest) on Mar 29, 2025 at 17:55 UTC
If you were feeding the 'uc' operator a string of utf8 bytes from your editor which perl had not been informed was intended as unicode, then perl would apply ascii uppercasing rules to that string of bytes. Now that you have the "use utf8" in your file, I think you'll find that 'uc' works properly on that string. But, you'll also find that perl warns you if you try to print that string, because in the default configuration the output streams expect bytes as input. You can either use `binmode(STDOUT, 'encoding(UTF-8)')` to declare that you intend to always write unicode to the file handle, or remember to encode the string before printing. Full unicode support exists in perl, but yeah it's kind of a learning curve to find it :-( But that's the price we pay for full multi-decade back-compat.	[reply] [d/l]
Re^4: Memory Leak with XS but not pure C by FrankFooty (Novice) on Mar 30, 2025 at 07:31 UTC
yeah your are right . This will be part of a bigger XS thing. Is there a macro I can use for uppercasing?	[reply]
Re^5: Memory Leak with XS but not pure C by NERDVANA (Priest) on Mar 31, 2025 at 16:49 UTC
Re^6: Memory Leak with XS but not pure C by FrankFooty (Novice) on Apr 01, 2025 at 12:59 UTC