in reply to string matching

Something like this?

#! perl -slw use strict; while( <DATA> ) { chomp; s[^(\QThe seller has the following *fruit*\E types:)(.+)$]{ my($const, $rep ) = ( $1, $2 ); $rep =~ tr[* ().][#]d; $rep =~ s[,\s*and][and]; "$const*($rep)*." }e; print; } __DATA__ The seller has the following *fruit* types: ( *apples* , *oranges* , * +pears* , *berries* , and *mangoes* ). The seller has the following *fruit* types: *apples* , *oranges* , *p +ears* , *berries* , and *mangoes* . The seller has the following *fruit* types: ( *apples* , *oranges* , * +pears* , *berries* and *mangoes* ). The seller has the following *fruit* types: *apples* , *oranges* , *p +ears* , *berries* and *mangoes* .

Output:

c:\test>785401.pl The seller has the following *fruit* types:*(#apples#,#oranges#,#pears +#,#berries#and#mangoes#)*. The seller has the following *fruit* types:*(#apples#,#oranges#,#pears +#,#berries#and#mangoes#)*. The seller has the following *fruit* types:*(#apples#,#oranges#,#pears +#,#berries#and#mangoes#)*. The seller has the following *fruit* types:*(#apples#,#oranges#,#pears +#,#berries#and#mangoes#)*.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP PCW

Replies are listed 'Best First'.
Re^2: string matching
by newbio (Beadle) on Aug 01, 2009 at 18:55 UTC
    Thank you Monks for the suggestions. They help but not fully. The problem is as follows: the sentence above itself can assume several forms. The idea is to catch the following pattern in a sentence, here is the pseudo regular expression (space is shown here as ' '): \(? \*.+?\* , \*.+?\* ... \*.+?\* , and \*.+?\* \)?

    I tried to match this pattern using something like: { $line=~/\(?\s(\*.+?\*\s(\,\s)?)+and\*.+?\*\s\)?/; } but it does not work.

    This pattern in a sentence can potentially appear anaywhere in a sentence, such as, start, middle, or end of sentence. Also, there can be multiple such patterns in a single sentence. For example, the pattern can appear as (a comma may or may not be present before 'and'):

    *q(53)* and *s7* are related with: ( *r:72* , *p/93* , and *s8* ) , and they are also related with *s:2* , *u6* , *g78* but not *s8* . In this example I want to cluster and tag, "*q(53)* and *s7*", "*r:72* , *p/93* , and *s8*" and "*s:2* , *u6* , *g78*" together in the sentence such that the sentence will look like:

    *q(53)#and#s7* are related with: *(#r:72#,#p/93#,#and#s8#)* , and they are also related with *s:2#,#u6#,#g78* but not *s8* . Please note that when there is 'but not' then only the preceding part of the pattern is selected.

    Thanks very much.

      Sorry, but I just read your post through 3 times, and I have literally no idea what it is that you are trying to convey.

      Your first post was an (apparently not ao) clear definition of possible inputs and required output. The above sounds like you're making it up as you go along. Totally unintelligible.

      My suggestion to you is that you post some real examples of the actual input. And then some matching examples of what output you would require from each of those inputs. If you cannot do that, this is probably not a real problem.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.