Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

In the below example I want to select the word 'hello' from the first 3 but not in the fourth
1.hello> 2.<hello 3.hello <hello>
I have a regex for \bhello\b(?![ ^\]\w:-]*?>) but this doent work for "hello>" please help

Replies are listed 'Best First'.
Re: regec to select text ather than remove HTML tags
by Anonymous Monk on Jan 23, 2012 at 12:08 UTC

    You could maybe use  /^\d+\..*?hello.*$/m

    It means

    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr/^(\d+\..*?hello.*)$/m )->explain; __END__ The regular expression: (?m-isx:^(\d+\..*?hello.*)$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?m-isx: group, but do not capture (with ^ and $ matching start and end of line) (case- sensitive) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of a "line" ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- hello 'hello' ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- $ before an optional \n, and the end of a "line" ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re: regec to select text ather than remove HTML tags
by locked_user sundialsvc4 (Abbot) on Jan 23, 2012 at 13:38 UTC

    Not being too much of a “golfer,” I tend to solve such problems in two steps:   first, I look for the string-structure that I am looking for, then I look for “hello...” within that string.

    One issue that you should consider is that ... right now, you have no clearly-defined beginning/ending delimiter:   where does the string begin, and where does it end?   In such a case, the less-than/greater-than strings are the only reliable anchor-points that you have, in which case split() and pos() become your friends.   (Along with the i,g modifiers of a regex.)   You might be able to construct the argument (and therefore, a program) which says that what you really have here is a string that is “split by” either of these two characters.   You iterate through the string, looking for these characters and noting their positions.   You decide if a string-of-interest could be “beginning” or “ending,” and you extract the pieces for a closer look with substr().

    Really, the true challenge of this kind of algorithm is “ruggedly and completely defining it.”   It probably will be a two-part solution.   (“First, find the strings, then, see if they’re interesting.”)   After you have used perldoc and then maybe a few experimental programs to confirm in your own mind how these various Perl tools work, spend some serious thought-time defining your algorithm.   It might not be entirely trivial.   I would go so far as to recommend constructing a series of test-cases with test-strings, and build a Test::More test suite to actually and completely test it.   You could easily construct a subtly flawed algorithm, bang it a few times, say, “yep, it seems to work,” and find that you are totally-wrong when your code goes into production.   It happens.   (A lot.)   And, it’s not pretty or fun.   The “extra” time needed to “prove it!!” will be worthwhile.

Re: regec to select text ather than remove HTML tags
by JavaFan (Canon) on Jan 23, 2012 at 14:21 UTC
    Untested:
    !/<hello>/ and /(hello)/ and print $1
      I need a plain regex expression which can be used as a condition what I have come up with is : \bhello\b(?! ^\\w:-]*?>) please help
Re: regec to select text ather than remove HTML tags
by Veer (Initiate) on Jan 23, 2012 at 12:34 UTC
    that did not work I want the follwing combinations to be selected <hello hello> hello but not <hello> thanks for your help

        The code seemed to work for me.

        Using pm_txt.txt for input for pm_regex.pl pm_txt.txt

        1.hello> 2.<hello 3.hello <hello>

        pm_regex.pl

        use strict; use warnings; my $filename = shift or die "Usage $0 FILENAME\n"; open my $fh, '<', $filename or die "Could not open '$filename'\n"; while (my $line = <$fh>) { chomp $line; if ($line =~ /^\d+\..*?(hello).*$/) { print "In $line $1 matches\n"; } else { print "$line doesn't match\n"; } }

        Running perl pm_regex.pl pm_text.txt produced the output:

        In 1.hello> hello matches

        In 2.<hello hello matches

        In 3.hello hello matches

        <hello> doesn't match

Re: regec to select text ather than remove HTML tags
by Veer (Initiate) on Jan 23, 2012 at 12:33 UTC
    that did not work I want to select all the following combinations <hello hello hello> but not <hello> thanks for your help