Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm finishing a site that uses HTML::Mason as templating system. Before its creation, I defined functions __() and the like that mark its argument as translatable. Now I'm seeking a way to extract strings that are subject to translation, from mason components. So in some of my components I have code like this:
% my $c = __('Activity report'); #needs extracting <div> <%perl> $c = __('Another string'); #needs extracting </%perl> <% __('Browsing report:') %> <!-- needs extracting --> <% $c %> </div>
It's nice that I didn't use HERE DOCUMENTS syntax for strings.. As you can see from the sample, writing parser that will extract strngs from components would be non-trivial. Have anybody did anything like this before? Do you have any ideas how to implement it? Googling does not help much.. Thanks for your suggestions in advance!

Replies are listed 'Best First'.
Re: how to extract strings from HTML::Mason components?
by Corion (Patriarch) on Dec 12, 2008 at 09:04 UTC

    For a quick start, what's wrong with:

    my @translatables = ($sourcecode =~ /\b__\((.*?)\)/g);

    This will extract all tokens that need translation, at least from the specification you've shown. If you do more fancy stuff like nested parentheses etc., things get hairier, but as you're the one writing the code, I recommend you Just Don't Do That.

      Hmm, nice idea! It's a hack, of course, but it seems it should work with my code since I wrote calls of __() very carefully (argument fits on 1 line, no nested comments etc). Thanks!!
Re: how to extract strings from HTML::Mason components?
by Fletch (Bishop) on Dec 12, 2008 at 14:34 UTC

    Less hacky might be to run the components through a different implementation of sub __ which simply logs the current component and the argument.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: how to extract strings from HTML::Mason components?
by jeffa (Bishop) on Dec 12, 2008 at 15:41 UTC

    Why don't you have __() handle the strings as they are accessed? You did not discuss what happens inside __() so I do realize that my suggestion might not be easy or even possible. But i would think that the process would be:

    1. call __()
    2. look up the translation for given language for given string
    3. return the translated version if found
    4. if not there, add it to the database and flag it for the translators to add a translation
    5. return the string as is if no translated version is available

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      Hi, Thank you for your answer. There are too many branches in the logic, so in order to reach every line of code where __() resides I will have to spend too much time.