LaTeX: regex or xpath?

toro has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to extract LaTeX from between dollar signs. Practically, for a report. Can I do so using the XPath module? Or do I have to use a regex?

Example text:

We find that the generical scale behavior of structure functions in the inertial range is not self-similar $S_n(\ell)\propto \ell^{\zeta_n}$ but includes an “exponential self-similar” behavior $S_n(\ell) \propto \exp\zeta_n\alpha^{-1} \ell^{\alpha}$ where $\alpha$ is a parameter proportional to the inverse of the logarithm of the Reynolds number.

Regexing this is hard. I wrote a loop that counts up dollar signs and kills text to the left of odd \$'s. That seems garish. This is what XPath was made for, no? But $xp->find( "/html/body/p/$/text()" ) is wrong.

Thank you!

PS It would also be great if I could also convert e.g. Schr\"odinger to Schrödinger. The LaTeX package gogol shewed me is v0.01, so I can't use that.

ANSWER: The following one-liner, due to the monks, comprises a script to display equations on your web site (if you don't use mathJAX): $text =~ s#\$([^\$]+\$)#<img src="http://latex.codecogs.com/gif.latex?\\large%20\\dpi{150}%20\\bg_white%20\1" />#gm;. Cheers.

Comment on LaTeX: regex or xpath? Select or Download Code

Replies are listed 'Best First'.
Re: LaTeX: regex or xpath? by Corion (Patriarch) on Jun 09, 2011 at 20:09 UTC
XPath is for XML. LaTeχ is not XML. Don't use Google, use LaTeX. I have no experience with any of the modules, but I have written some LaTeχ myself. To extract "all text between `$` signs (that make up LaTeχ mathematical sections), I'd simply use a regular expression: `my @equations = /\$([^$]+)\$/;` [download]	[reply] [d/l] [select]
Re^2: LaTeX: regex or xpath? by Eliya (Vicar) on Jun 09, 2011 at 20:25 UTC
`my @equations = /\$([^$]+)\$/;` [download] To get all matches, you'd need the `/g` option. Also, `$]` is a special variable, so the dollar sign must be escaped here.	[reply] [d/l] [select]
Re^2: LaTeX: regex or xpath? by toro (Beadle) on Jun 09, 2011 at 20:26 UTC
Your regex won't work because of greediness. I think using `/g` as in page 210 of the cookbook might make a regex work. LaTeX::Parser from your suggested search is in version `0.01` and buggy. That's the same package I meant above. EDIT: Ah, you mean LaTeX::TOM! Perfect, I'll try that.	[reply] [d/l] [select]
Re^3: LaTeX: regex or xpath? by Eliya (Vicar) on Jun 09, 2011 at 20:31 UTC
Greediness isn't an issue here, because the character class `[^\$]` doesn't allow a `$` to be part of the captured fragments. Alternatively, you could use a non-greedy pattern: `/\$(.+?)\$/g` `my $latex = q\|We find that the generical scale behavior of structure f +unctions in the inertial range is not self-similar $S_n(\ell)\propto +\ell^{\zeta_n}$ but includes an \u201cexponential self-similar\u201d +behavior $S_n(\ell) \propto \exp[\zeta_n\alpha^{-1} \ell^{\alpha}]$ w +here $\alpha$ is a parameter proportional to the inverse of the logar +ithm of the Reynolds number.\|; my @equations = $latex =~ /\$(.+?)\$/g; say for @equations;` [download] Output: `S_n(\ell)\propto \ell^{\zeta_n} S_n(\ell) \propto \exp[\zeta_n\alpha^{-1} \ell^{\alpha}] \alpha` [download]	[reply] [d/l] [select]
Re^4: LaTeX: regex or xpath? by toro (Beadle) on Jun 09, 2011 at 22:14 UTC
Re: LaTeX: regex or xpath? by choroba (Cardinal) on Jun 10, 2011 at 07:50 UTC
Does your latex source contain equations of this form: `$$ \sum_{x=1}^{\infty}e^x $$` [download] i.e. double dollars?	[reply] [d/l]
Re^2: LaTeX: regex or xpath? by toro (Beadle) on Jun 12, 2011 at 06:07 UTC
Luckily not. I can see how that would pose additional problems. I guess the ANSWER I shared is not general.	[reply]