toro has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to extract LaTeX from between dollar signs. Practically, for a report. Can I do so using the XPath module? Or do I have to use a regex?

Example text:

We find that the generical scale behavior of structure functions in the inertial range is not self-similar $S_n(\ell)\propto \ell^{\zeta_n}$ but includes an “exponential self-similar” behavior $S_n(\ell) \propto \exp\zeta_n\alpha^{-1} \ell^{\alpha}$ where $\alpha$ is a parameter proportional to the inverse of the logarithm of the Reynolds number.

Regexing this is hard. I wrote a loop that counts up dollar signs and kills text to the left of odd \$'s. That seems garish. This is what XPath was made for, no? But $xp->find( "/html/body/p/$/text()" ) is wrong.

Thank you!

PS It would also be great if I could also convert e.g. Schr\"odinger to Schrödinger. The LaTeX package gogol shewed me is v0.01, so I can't use that.

ANSWER: The following one-liner, due to the monks, comprises a script to display equations on your web site (if you don't use mathJAX): $text =~ s#\$([^\$]+\$)#<img src="http://latex.codecogs.com/gif.latex?\\large%20\\dpi{150}%20\\bg_white%20\1" />#gm;. Cheers.

Replies are listed 'Best First'.
Re: LaTeX: regex or xpath?
by Corion (Patriarch) on Jun 09, 2011 at 20:09 UTC

    XPath is for XML. LaTeχ is not XML.

    Don't use Google, use LaTeX. I have no experience with any of the modules, but I have written some LaTeχ myself.

    To extract "all text between $ signs (that make up LaTeχ mathematical sections), I'd simply use a regular expression:

    my @equations = /\$([^$]+)\$/;
      my @equations = /\$([^$]+)\$/;

      To get all matches, you'd need the /g option.  Also, $] is a special variable, so the dollar sign must be escaped here.

      Your regex won't work because of greediness. I think using /g as in page 210 of the cookbook might make a regex work.

      LaTeX::Parser from your suggested search is in version 0.01 and buggy. That's the same package I meant above.

      EDIT: Ah, you mean LaTeX::TOM! Perfect, I'll try that.

        Greediness isn't an issue here, because the character class [^\$] doesn't allow a $ to be part of the captured fragments.

        Alternatively, you could use a non-greedy pattern: /\$(.+?)\$/g

        my $latex = q|We find that the generical scale behavior of structure f +unctions in the inertial range is not self-similar $S_n(\ell)\propto +\ell^{\zeta_n}$ but includes an \u201cexponential self-similar\u201d +behavior $S_n(\ell) \propto \exp[\zeta_n\alpha^{-1} \ell^{\alpha}]$ w +here $\alpha$ is a parameter proportional to the inverse of the logar +ithm of the Reynolds number.|; my @equations = $latex =~ /\$(.+?)\$/g; say for @equations;

        Output:

        S_n(\ell)\propto \ell^{\zeta_n} S_n(\ell) \propto \exp[\zeta_n\alpha^{-1} \ell^{\alpha}] \alpha
Re: LaTeX: regex or xpath?
by choroba (Cardinal) on Jun 10, 2011 at 07:50 UTC
    Does your latex source contain equations of this form:
    $$ \sum_{x=1}^{\infty}e^x $$
    i.e. double dollars?
      Luckily not. I can see how that would pose additional problems. I guess the ANSWER I shared is not general.