newbio has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

As part of a problem, I am trying to do the following but cannot come up with a real nice solution. I tried to do this using both regex and otherwise but could not do it correctly.

Input sentence to program: The seller has the following *fruit* types: ( *apples* , *oranges* , *pears* , *berries* , and *mangoes* ).

Output: The seller has following *fruit* types: *(#apples#,#oranges#,#pears#,#berries#and#mangoes#)*.

The input can have two optional variants: no parenthesis, and/or no comma before 'and' in the sentence, also the number of fruits is a variable with a minimum value of 2 fruits - the output should change accordingly.

Thanks very much in advance.

Raj

Replies are listed 'Best First'.
Re: string matching
by BrowserUk (Patriarch) on Aug 01, 2009 at 02:23 UTC

    Something like this?

    #! perl -slw use strict; while( <DATA> ) { chomp; s[^(\QThe seller has the following *fruit*\E types:)(.+)$]{ my($const, $rep ) = ( $1, $2 ); $rep =~ tr[* ().][#]d; $rep =~ s[,\s*and][and]; "$const*($rep)*." }e; print; } __DATA__ The seller has the following *fruit* types: ( *apples* , *oranges* , * +pears* , *berries* , and *mangoes* ). The seller has the following *fruit* types: *apples* , *oranges* , *p +ears* , *berries* , and *mangoes* . The seller has the following *fruit* types: ( *apples* , *oranges* , * +pears* , *berries* and *mangoes* ). The seller has the following *fruit* types: *apples* , *oranges* , *p +ears* , *berries* and *mangoes* .

    Output:

    c:\test>785401.pl The seller has the following *fruit* types:*(#apples#,#oranges#,#pears +#,#berries#and#mangoes#)*. The seller has the following *fruit* types:*(#apples#,#oranges#,#pears +#,#berries#and#mangoes#)*. The seller has the following *fruit* types:*(#apples#,#oranges#,#pears +#,#berries#and#mangoes#)*. The seller has the following *fruit* types:*(#apples#,#oranges#,#pears +#,#berries#and#mangoes#)*.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thank you Monks for the suggestions. They help but not fully. The problem is as follows: the sentence above itself can assume several forms. The idea is to catch the following pattern in a sentence, here is the pseudo regular expression (space is shown here as ' '): \(? \*.+?\* , \*.+?\* ... \*.+?\* , and \*.+?\* \)?

      I tried to match this pattern using something like: { $line=~/\(?\s(\*.+?\*\s(\,\s)?)+and\*.+?\*\s\)?/; } but it does not work.

      This pattern in a sentence can potentially appear anaywhere in a sentence, such as, start, middle, or end of sentence. Also, there can be multiple such patterns in a single sentence. For example, the pattern can appear as (a comma may or may not be present before 'and'):

      *q(53)* and *s7* are related with: ( *r:72* , *p/93* , and *s8* ) , and they are also related with *s:2* , *u6* , *g78* but not *s8* . In this example I want to cluster and tag, "*q(53)* and *s7*", "*r:72* , *p/93* , and *s8*" and "*s:2* , *u6* , *g78*" together in the sentence such that the sentence will look like:

      *q(53)#and#s7* are related with: *(#r:72#,#p/93#,#and#s8#)* , and they are also related with *s:2#,#u6#,#g78* but not *s8* . Please note that when there is 'but not' then only the preceding part of the pattern is selected.

      Thanks very much.

        Sorry, but I just read your post through 3 times, and I have literally no idea what it is that you are trying to convey.

        Your first post was an (apparently not ao) clear definition of possible inputs and required output. The above sounds like you're making it up as you go along. Totally unintelligible.

        My suggestion to you is that you post some real examples of the actual input. And then some matching examples of what output you would require from each of those inputs. If you cannot do that, this is probably not a real problem.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: string matching
by halfcountplus (Hermit) on Aug 01, 2009 at 04:14 UTC
    @BrowserUK: you fergot some enclosing *

    Another similar, but different way:
    #!/usr/bin/perl -w use strict; while (<DATA>) { $_ =~ /(The seller has the following \*fruit\* types: )\(?([a-z ,* +]+)\)?\./; my ($const, $rep) = ($1, $2); $rep =~ s/\s?\*\s?/#/g; $rep =~ s/,? and/and/; print "$const*($rep)*.\n"; }
    Same input, same output (but with those extra *)
Re: string matching
by Anonymous Monk on Aug 01, 2009 at 02:55 UTC
    Show what you have