blakew has asked for the wisdom of the Perl Monks concerning the following question:

I'm reading data from disk using an external C library that loads numeric data into a void * buffer (I also have to know the total size of the buffer), and I want to transform this data into an array of Perl scalars in XS code.

The problem is the data can be stored in an arbitrary number of bits per unit (bps) - 8, 16, or 32, and can be either integer or floating point (or unknown). Due to my lack of C-fu I'm creating Perl SV's and de-referencing C pointers using dumb if statements..

// buffer <- void* pre-loaded data /* SAMPLEFORMAT * 1 = unsigned integer data * 2 = two’s complement signed integer data * 3 = IEEE floating point data [IEEE] * 4 = undefined data format */ if ( fmt == 3 && bps == 32 ) { // Floating-point if ( av_store( array, i, newSVnv( (double)(((float*)buffer)[i]) ) ) == NULL ) ok = 0; } else if ( bps == 8 ) { // Char if ( av_store( array, i, newSViv( (int)(((char*)buffer)[i]) ) ) == NULL ) ok = 0; } // Int else if ( bps == 16 ) { if ( av_store( array, i, newSViv( (int)(((uint16*)buffer)[i]) ) ) == NULL ) ok = 0; } else { if ( av_store( array, i, newSViv( (int)(((int*)buffer)[i]) ) ) == NULL ) ok = 0; } ...

Is there a better way of doing this? I feel like my code is very inelegant and bound to break.

Update: added 'unpack' to the title.

Replies are listed 'Best First'.
Re: C types and SV's
by Corion (Patriarch) on Jul 18, 2010 at 18:52 UTC

    Personally, I would keep just the "dumb memory" in C and use pack/unpack to fetch the data from it using Perl. But that may well be because I feel more confident with Perl than I feel with C. Also potentially see Tie::Array::PackedC for inspiration how to make your buffer appear as an array of integers from the outside.

      Thanks Corion. If you don't mind could you show me an example of pack/unpacking a 36-byte char* buffer into an array of integers, and/or: How would I move the buffer from C to Perl? (I'm aware there's documentation, which I will read, but I do much better with examples.)

        First you make the C data buffer available as a PV to Perl, I think via newSvPV. Depending on the documentation on who owns the memory, you don't want the memory to be freed by Perl but by your library, see the documentation on how to tell Perl to not free your PV.

        Then, you can use unpack to get at the raw numbers. This will likely use the same semantics as your C compiler, which should in this case be "good enough". Beware that this might break if you transport data between machines with differing endianness or word size:

        my $bps = 32; my $fmt = 3; my %template = ( "32,3" => 'd',# A double-precision float in native format. "32,1" => 'I', "32,2, => 'i', # ... add the other formats as you need ); my $rawdata = get_raw_data_from_C_library(); my $t = $template{ "$bps,$fmt" } or die "Couldn't find an unpack template for $bps BPS, format $fmt +"; my @samples = unpack $template, $rawdata;

      The problem here is that, in the general case (which the OP may not care about, I'll admit) using plain ol' pack/unpack isn't very portable, as you have to tell it how the structure is laid out in memory. And C makes very few guarantees about that. Merely by looking at the data you can't tell, for example, whether you have a 16-, 32- or 64-bit value; you can't tell whether a word is signed or unsigned; you can't tell whether a structure has been padded with empty space so that its members are on word boundaries; and so on.

      I'd take a look at Convert::Binary::C, and in particular the ccconfig script that comes with it.

        I'm aware of that, so my translation was mostly what the C code said. The C code is not portable for exactly the reason you cite, because the word size may vary (with the advent of 64-bit architectures) again. If I were to really implement a tiff reader, I would look at the documentation again, or at existing implementations, to make sure that reading the data reads the number of bytes and treats them correctly. I'm sure that for TIFF, there is a definition of endianness, and then one could switch to the Nn or Vv unpack templates instead of using the architecture dependent Ii templates.

Re: C types and SV's and unpack
by ikegami (Patriarch) on Jul 19, 2010 at 02:47 UTC
    Based on how you do the casting, the entire buffer is of the same type. You can take advantage of that.
    SV* new_sv_from_float_buffer(void* buffer, int i) { return newSVnv( (double)(((float*)buffer)[i]) ); } ... { SV* (*new_sv_from_buffer)(void*, int); if ( fmt == 3 && bps == 32 ) { new_sv_from_buffer = &new_sv_from_float_buffer; } ... ... Call new_sv_from_buffer repeatedly ... }
Re: C types and SV's
by afoken (Chancellor) on Jul 18, 2010 at 18:58 UTC

    Tell us what library you are using. Perhaps someone has some experience with it. Apart from that, I would at least change the magic numbers to constants, and use switch and case instead of an if-then-else cascade.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Tell us what library you are using.

      libtiff

        Are you trying to reinvent the wheel? Imager and Image::GeoTIFF::Tiled already implement interfaces to libtiff.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)