in reply to LaTeX: regex or xpath?

XPath is for XML. LaTeχ is not XML.

Don't use Google, use LaTeX. I have no experience with any of the modules, but I have written some LaTeχ myself.

To extract "all text between $ signs (that make up LaTeχ mathematical sections), I'd simply use a regular expression:

my @equations = /\$([^$]+)\$/;

Replies are listed 'Best First'.
Re^2: LaTeX: regex or xpath?
by Eliya (Vicar) on Jun 09, 2011 at 20:25 UTC
    my @equations = /\$([^$]+)\$/;

    To get all matches, you'd need the /g option.  Also, $] is a special variable, so the dollar sign must be escaped here.

Re^2: LaTeX: regex or xpath?
by toro (Beadle) on Jun 09, 2011 at 20:26 UTC

    Your regex won't work because of greediness. I think using /g as in page 210 of the cookbook might make a regex work.

    LaTeX::Parser from your suggested search is in version 0.01 and buggy. That's the same package I meant above.

    EDIT: Ah, you mean LaTeX::TOM! Perfect, I'll try that.

      Greediness isn't an issue here, because the character class [^\$] doesn't allow a $ to be part of the captured fragments.

      Alternatively, you could use a non-greedy pattern: /\$(.+?)\$/g

      my $latex = q|We find that the generical scale behavior of structure f +unctions in the inertial range is not self-similar $S_n(\ell)\propto +\ell^{\zeta_n}$ but includes an \u201cexponential self-similar\u201d +behavior $S_n(\ell) \propto \exp[\zeta_n\alpha^{-1} \ell^{\alpha}]$ w +here $\alpha$ is a parameter proportional to the inverse of the logar +ithm of the Reynolds number.|; my @equations = $latex =~ /\$(.+?)\$/g; say for @equations;

      Output:

      S_n(\ell)\propto \ell^{\zeta_n} S_n(\ell) \propto \exp[\zeta_n\alpha^{-1} \ell^{\alpha}] \alpha
        Thank you, Eliya. That code is clear and now my problem is solved.