monsieur_champs has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks
I was building up a little regexp-intensive program and had a feeling that maybe I should tell perl to study something in the code. I wasn't able to remember when study() is usefull, and turned back to the old'n'good perl documentation. From perldoc -f study, I figured out this (my bold print, to stress my point):

study SCALAR
study Takes extra time to study SCALAR ($_ if unspecified) in anticipation of doing many pattern matches on the string before it is next modified. This may or may not save time, depending on the nature and number of patterns you are searching on, and on the distribution of character frequencies in the string to be searched (...)

This makes me think: is there any easy, newbie-proof criteria to decide when to use study? Or, in other words: is there any simple way to decide if I need or not to study a scalar before doing pattern matching?

One could, of course, stop programming, hunt down some samples from his/her expected input data, and spend a whole day benchmarking things to decide on his/her on experience. But this is time-consuming and requires some knowledge not always available for our less-skilled brothers. For this cases, even a poor criteria is better than to stop the fork for a long time period to (human) study and off-toppic knowledge aquisition.

Maybe together the mighty and wise Monks from this monastery could provide all the community with something easy to use and precise enough to satisfy the needs of our little brothers, at least until they have the necessary skills to do it on his/her own.

Thank you all for the wisdom advices and considerations on this toppic.

Update:Oh, I read Why study SCALAR?, and need to stress that I'm not asking about what study() does, but when to use it and much more important how to decide about the moment to use this powerfull resource.


"In few words, translating PerlMonks documentation and best articles to other languages is like building a bridge to join other Perl communities into PerlMonks family. This makes the family bigger, the knowledge greather, the parties better and the life easier." -- monsieur_champs

Replies are listed 'Best First'.
Re: Help on decide when study
by davido (Cardinal) on Nov 13, 2003 at 19:04 UTC
    Given that study's benefits are difficult to predict, it might be best to just benchmark with and without, to see which version is quicker.

    The Owls book (1st Edition: Mastering Regular Expressions) says:

    • Don't use study when the target string is short.
    • Don't use study when you plan only a few matches against the target string.
    • Don't use study when Perl has no literal text cognizance for the regular expressions that you intend to benefit from the study. Without a known character that must appear in any match, study is useless.

    Study is most useful when you are matching a large string many times and your regular expressions contain literal text that must be found within the string.

    Study is also known to contain bugs in older versions of Perl, so use with caution.

    The best advice I can give a novice is to ignore study and look for other ways to optimize your code. If you can't find any design solution that is fast enough for your needs, then try invoking study and benchmark to see if it helps or not. But in general, don't expect a miracle.


    Dave


    "If I had my life to live over again, I'd be a plumber." -- Albert Einstein

      Dear davido
      First of all, thank you for answer that fast. (:

      Now, the points that still obscure to me:

      Don't use study when the target string is short.
      I have several years of experience coding perl, and this still an obscure point to me: What should I consider as a short string? I'm sure that 20 chars is short -- that's obvious. But how about 400, or even 2000 chars? Is that short?

      Don't use study when you plan only a few matches against the target string.
      Again, I'm in lack of a precise criteria to relay on: what shall I assume as a "few matches"? I know for sure that one or two is obviously "a few". But how many more shall I consider "a few"?

      Don't use study when Perl has no literal text cognizance for the regular expressions that you intend to benefit from the study. Without a known character that must appear in any match, study is useless.
      Sorry, I don't know what is "literal text cognizance", can you please explain it to me? (many thanks in advance!!)

      Once more, thank you very much for care and answer, and thank you very much for sharing your knowledge.

      May the gods bless you.


      "In few words, translating PerlMonks documentation and best articles to other languages is like building a bridge to join other Perl communities into PerlMonks family. This makes the family bigger, the knowledge greather, the parties better and the life easier." -- monsieur_champs

        I was afraid someone might ask what "short" and "long" are in this context, as well as "a few" and "many".... Those are ambiguous quantifiers. Reminds me of my economics classes when professors talked about "shortrun", and "longrun".

        I think that the experts (such as Friedl, in MRE) intentionally don't try to define what is short, long, few, or many. I won't try to second guess their caution about defining thresholds. But I think it's safe to say that in the context of study, a few thousand characters is pretty short. However, the only way to be sure is to benchmark it. And as I implied, bothering with study at all should be a last resort, after exhausting other design options.

        As for "literal text cognizance", the very following sentence defined it: "Without a known character that must appear in any match, study is useless." Literal text cognizance means that unless your regexps are looking for literal text within the string (as opposed to only containing "wildcard" matches), study is useless.

        .....in other words, if your RE contains ONLY the "wildcard" matching constructs such as ". \w \d \s \S \W \D", etc, and doesn't contain literal text, you're wasting your time with study.

        ...for example (warning; silly examples):

        m/\w+\b.?\d*$/; # wouldn't benefit from study. m/abc/; # may benefit from study. m/\d+abc\W.+/; # may benefit from study.


        Dave


        "If I had my life to live over again, I'd be a plumber." -- Albert Einstein
Re: Help on decide when study
by sauoq (Abbot) on Nov 13, 2003 at 19:46 UTC
    This makes me think: is there any easy, newbie-proof criteria to decide when to use study?

    Yes... "Never".

    It is so rarely useful that there is really no point in looking for places to use it.

    -sauoq
    "My two cents aren't worth a dime.";