Erosia has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise men.

I am struggling with regular expressions. I know the fundamentals, but when I want to add two conditions within one "/.../", I have no idea how to do that.

What I want is for the program to print only what is between ">" and "<" at each line, but only if there isn't another > or < within the original ones. For instance: >knowledge< is good and >knowled<ge< is bad. My while-loop looks like this:
while (<FILE>) { />(.+) and (^\w+)</; print "$1 "};
The (^\w+) should match only strings that start with one or more alphanumeric character. Although it doesn't quite work as hoped.

I appreciate any help you experts may provide.

Replies are listed 'Best First'.
Re: Multiple conditions
by ikegami (Patriarch) on Mar 13, 2010 at 21:35 UTC
    while (<$fh>) { print "$_\n" for />([^><]+)</g }
      Thanks.
Re: Multiple conditions
by 7stud (Deacon) on Mar 14, 2010 at 08:15 UTC
    I am struggling with regular expressions. I know the fundamentals..

    />(.+) and (^\w+)</

    Apparently not. Are the characters 'a', 'n', and 'd' special regex characters? If not, what do those characters match when inside a regular expression?

    use strict; use warnings; use 5.010; my @strings = ( 'hello', 'hello >knowled<ge< goodbye', 'hello >knowledge< goodbye', 'hello >!knowledge< goodbye', ); for (@strings) { if ( />(.+)</ ) { my $word = $1; if ( $word =~ /^\w[^><]*$/ ) { say $word; } } } --output:-- knowledge

    None of the solutions posted so far will yield the same results.

    You could write:

    if ( />(.+)/ and /(^\w+)</ ) { #...do something }

    But those regexes don't fully express the match you are looking for.

      Apparently not. Every man can make mistakes.
      use strict; use warnings; use 5.010; my @strings = ( 'hello', 'hello >knowled<ge< goodbye', 'hello >knowledge< goodbye', 'hello >!knowledge< goodbye', ); for (@strings) { if ( />(.+)</ ) { my $word = $1; if ( $word =~ /^\w[^><]*$/ ) { say $word; } } }
      ...seems fine. But I am actually reading from a corpus. So the whole program looks like this (with ikegami's suggestion):
      use warnings; open FILE, "parole.sgm" or die "Couldn't open file."; while (<FILE>) { print "$_ " for />([^<>]+)</g }; close FILE;
      So, can you rephrase your arrays solution to fit my filehandle program?

      Furthermore, ikegami almost fixed my little program with the exception that all commas, periods etc. have a space on each side instead of only having one at the end as it should be. Is that doable?

Re: Multiple conditions
by Anonymous Monk on Mar 13, 2010 at 21:35 UTC
Re: Multiple conditions
by Anonymous Monk on Mar 14, 2010 at 01:27 UTC
    try this, using a pipe:
    while (<FILE>) { />(.+)|(^\w+)</; print "$1 "};