If newSVpv is given a length of zero, it calls strlen on that pointer. You probably want newSVpvn. If your u8 library was given an actual string which was length 0, then it may have called malloc with a length zero, and malloc is permitted to return a magic pointer value that indicates no allocation needs freed, which means you can't legally read the first byte of it during strlen, which seems like what valgrind is complaining about.

Also, why does your code call u8_strlen when you already have the length?

For the rest (33K mallocs, 24K frees), I'm guessing perl doesn't bother to deeply free every data structure as it exits, since that would just waste time when the OS will clean it up anyway.

Update

Also, I suspect you should be using SvPVutf8 instead of SvPVbyte. The comment that "strings from my editor are already utf8 encoded" probably means that perl sees your strings as a string of bytes which *happen* to be utf-8 byte sequences, and that probably isn't what you want. Unicode handling in Perl can get very confusing because Perl requires the programmer to keep track of which strings are bytes and which are unicode, and also keep track of which APIs expect to receive bytes or strings of unicode. If you want to type unicode string literals and have perl understand them as unicode text, you should declare "use utf8;" at the top of your script. Otherwise you have declared an array of bytes, and if you pass that to an API expecting unicode, your strings could get double-encoded.

Back to your XS method, using SvPVbyte means that your XS library API needs to document that it operates on "byte strings which are expected to be a valid utf-8 encoding". Maybe this is what you want? but it will crash if a user passes it perl's understanding of unicode, e.g. uppercase_utf8_2("\x{100}").

And finally, I'm curious how this library is an improvement over perl's own 'uc' operator. Does perl incorrectly handle some cases?


In reply to Re: Memory Leak with XS but not pure C by NERDVANA
in thread Memory Leak with XS but not pure C by FrankFooty

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.