EigenFunctions has asked for the wisdom of the Perl Monks concerning the following question:

First off, I love Perl! The language itself is wonderful, but, no matter what I want to do, there always seems to be a package to make it easier.

Long live Perl!

I'm trying to search through ODT files (Apache OpenOffice text) but I'm having problems with RegEx modifiers. When I use qr/faith/i as in

$doc->selectElementsByContent(qr/faith/i)

I get the cryptic message:
Can't locate object method "getTextDescendants" via package "Regexp" at D:/Apps (x86)/Perl32/site/lib/OpenOffice/OODoc/Text.pm line 452.
It's not clear to me what's the problem nor what to do to fix it.

Can anyone help?

Thanks,
  EigenFunctions
  OpSys: Win7 Professional/Home Premium x64 Service Pack 1

Replies are listed 'Best First'.
Re: RegEx for OpenOffice::OODoc as in $doc->selectElementsByContent()
by choroba (Cardinal) on Jun 03, 2016 at 15:07 UTC
    The function doesn't seem to be documented. But maybe looking at its source can give you a clue? ref qr// returns true, so it's considered "context" instead of the pattern. Try specifying the regex as a string, not as an object.
    $doc->selectElementsByContent('(?i:faith)');

    Update: Code sample added.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      Hey choroba, I think the documentation for the methods in the Text library is simply separate from the code.

      Without being able to test, I came to the same conclusion... stringify the regex:

      $doc->selectElementsByContent('faith');

      However, I can't figure out how to include the case-insensitive modifier (/i) into such a thing. Can you shed some light?

        > Can you shed some light?

        I've updated the answer.

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: RegEx for OpenOffice::OODoc as in $doc->selectElementsByContent()
by Anonymous Monk on Jun 04, 2016 at 09:16 UTC

    Note that you can probably keep on using the qr// if you coerce it to plain string:

    $doc->selectElementsByContent("". qr/faith/i);

      Thanks for the tip! I like using qr//i better because it is easier to understand. With that construct, months from now, I'll look at the code and understand how it works (or how to modify it for another different task).

      As usual, it's not evident to me why the need for null concatenation, but it's not too important since it works.

      Thanks,
        EigenFunctions
        OpSys: Win7 Professional/Home Premium x64 Service Pack 1

        ... it's not evident to me why the need for null concatenation ...

        I haven't looked at the docs or source at all, but apparently the  selectElementsByContent() method does not like to be passed a reference. The  qr// operator returns a  Regexp reference. In essence, the  "". qr/faith/i expression replicates the approach of choroba above by stringizing this reference before passing it as an argument, but still allowing all the facility of the  qr// operator to form the pattern object in the first place: all the i-s and t-s get properly dotted and crossed, respectively. (You crotted an i in the solution you attempted here.) I feel it's almost always better to use  qr// to build patterns than than to try to express the patterns as strings using string operators that are almost, but not quite, the same.

        Here's an example of conversion of a  Regexp object to a string and its subsequent use as a string in a match.

        c:\@Work\Perl\monks>perl -wMstrict -le "my $rx = qr{ (?i) f a i t h }xms; print 'qr// object: ', ref_or_not($rx); print $rx; ;; my $rx_stringized = '' . $rx; print 'stringization: ', ref_or_not($rx_stringized); print qq{'$rx_stringized'}; ;; my $s = 'UnFaItHfUl'; $s =~ $rx_stringized; print qq{'$&'}; ;; sub ref_or_not { my $scalar = shift; return ref $scalar ? ref $scalar : 'not a ref'; } " qr// object: Regexp (?^msx: (?i) f a i t h ) stringization: not a ref '(?^msx: (?i) f a i t h )' 'FaItH'
        Note that the  =~ operator is quite happy to use a pure string as a match pattern.


        Give a man a fish:  <%-{-{-{-<