comment on

Thanks starbolin, wazzuteke, and, tye. All of your posts are very helpful. For simplicity, I'm going to respond to all of them at once.

First a little more about the code I'm adapting. It's code to calculate measures of 'translation quality' for human language translations. They can be used for actual human performed translations, but we use them to test machine translation systems.

(In case you are interested, there are six measures to choose from, only one is done on each run of the original program: IBM BLEU, NIST, NIST's variation on IBM BLUE, PER -- position-independent error rate, WER -- Levenshtien-distance-based error rate, and the average of WER and PER. Each of these measures is quite compute intensive -- a pure Perl version we have here at work takes like 30 minutes at least to run the IBM BLEU calculation. Thus the motivation to make a Perl extension in C++. I would have used C, but I'm not very experienced with C/C++ and we already have this C++ implementation.)

As for abandoning the int main() interface, 1) it needs to return double, not int -- when called in scalar context, 2), I didn't want to mess with the typemap headache of translating an array ref on the perl side into a char ** to feed to main() -- that is not a straightforward proposition, and 3) making slots in the interface for each kind of information makes for a much cleaner or at least easier to grok interface, than just having a char ** with random stuff in it.

Wazzuteke's second point about grokability of manipulating the stack too far down is exactly what I was worried about, but hadn't been able to think through clearly enough on my own. I definitely agree with your point that setting the return values too far down is a bad idea. I was just worried about the alternate maintenance difficulty of managing six copies of the same code. Which one is worse? Then I looked more closely at my six wrapper XSUBs and realized that three were nearly identical to each other and the other three were also nearly identical to each other so I realized I could take advantage of XS'a ALIAS keyword here. and reduce the number of XSUBs to two. Since the prospect of having to maintain two copies of the same thing is much less daunting I've definitely moved away from my original idea. I now have a design very similar to the pseudo-example below. Actually the XS part is nearly the same -- the non-XS part is greatly simplified, but I've preserved enough so I think you can get an idea of what's going on.

double calculateStuff(int calculationType, char * translationFile, AV 
+*referenceFiles, AV* report, char *extra1, int extra2) {
    double score = 0.0;
    ifstream trans(translationFile); 
    vector<ifstream> refs;
    numRefs = av_len(referenceFiles);
    for (I32 i = 0; i < numRefs; i++)
        files[i] = new ifstream(SvPV(*(av_fetch(refFiles, i, 0))));

    // figure out what the score should be
    switch calculationType {
        case 0:
            // do calculation type number 0
        case 1:
            // do calculation type 1
        case ...
            // yadayada up through type 5
        default:
            // complain I don't know what to do
    }
    if (report) {
         av_push(report,newSVpv('score', PL_na));
         av_push(report,newSVnv((NV)score);
         av_push(report,newSVpv('variance', PL_na));
         av_push(report,newSVnv((NV)variance);
         av_push(report,newSVpv('length penalty', PL_na));
         av_push(report,newSVnv((NV)lengthPenalty);
         //etc., etc.
    }
    return score;
}


MODULE = blah     PACKAGE = blah
    
void
type1(char *hypFile, AV *AV_refFiles, int extra = 0)
    ALIAS:
        type5 = 5
        type4 = 4
    PREINIT:
        AV *report = 0;
        bool wantarray = false;
        double score;
    PPCODE:
        wantarray = GIMME_V == G_ARRAY ? true : false;
        if (wantarray) report = newAV();
        score = computeScore((ix ? ix : 1), hypFile, refFiles, report,
+ 0, extra);
        if (wantarray) {
            EXTEND(SP,av_len(report));
            
            for (I32 i = 0; i < av_len(report); i++)
                PUSHs(sv_2mortal(*(av_fetch(report, i, false))));
        }
        else {
            PUSHs(sv_2mortal(newSVnv((NV)score)));
        }

void
type0(char *hypFile, AV *AV_refFiles, char * extra = 0)
    ALIAS:
        type2 = 2
        type3 = 3
    PREINIT:
        AV *report = 0;
        bool wantarray = false;
        double score;
    PPCODE:
        wantarray = GIMME_V == G_ARRAY ? true : false;
        if (wantarray) report = newAV();
        score = computeScore(ix, hypFile, refFiles, report, extra, 0);
        if (wantarray) {
            EXTEND(SP,av_len(report));
            
            for (I32 i = 0; i < av_len(report); i++)
                PUSHs(sv_2mortal(*(av_fetch(report, i, false))));
        }
        else {
            PUSHs(sv_2mortal(newSVnv((NV)score)));
        }
[download]

I like the point that pretty much everyone made one way or another that you should do Perl stuff with Perl and other stuff in XS/C/C++. Trying to do too much Perl stuff in C/XS is kind of taking the scenic route, and forcing the next guy to look at your code to take the scenic route too, and as for doing hard/slow stuff in Perl, well, that's the whole point of XS.

Please feel free to comment on the design above, I'm still not sure I've set things up the best way, especially in terms of separating Perl and C++/XS the best way.

Thanks again,

--DrWhy

"If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."

In reply to Re: XS design question by DrWhy
in thread XS design question by DrWhy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.