Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have an array of words, that i need to sort, i want to only include the words that start with a letter, contain no more than 1 hyphen in a row, and end in a letter. so:

brian0, brian-, -brian, bri--an would all not be valid words. Bria-n, br-i-an, brian would be valid words

update (broquaint): shortened title (was: Sorting words, keeping only words that start with a letter and contain only letter characters, and hypens, and end in a letter) and added some formatting

  • Comment on Sorting words, keeping only certain words

Replies are listed 'Best First'.
Re: Sorting words, keeping only words that start with a letter and contain only letter characters, and hypens, and end in a letter
by BrowserUk (Patriarch) on Nov 10, 2002 at 22:45 UTC

    No attempt a golf, but clear I think.

    #! perl -sw use strict; while (<DATA>) { chomp; print sprintf('%-20s', $_), (/^[a-z][a-z-]+?[a-z]$/i and not /--/) ? ' is ' : ' is not ', 'a valid word according to the rules specified.', $/; } __DATA__ brian0 brian- -brian bri--an Bria-n br-i-an brian

    P.S. There's no need to put the whole question in the title:)


    Nah! You're thinking of Simon Templar, originally played (on UKTV) by Roger Moore and later by Ian Ogilvy

      This will fail on valid edge cases words like 'Oh no I no go OK' as your minimum valid word length is 3 chars. A small bit of modification will cover all cases:

      my @data = sort grep { /^[a-z][a-z-]*[a-z]$/i && ! /--/ || /^[a-z]$/i +} <DATA>;

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Sorting words while excluding some.
by sauoq (Abbot) on Nov 11, 2002 at 00:06 UTC

    I'm sure at least one of the solutions offered thus far works but I'm offering the following regex because it matches exactly what you want rather than trying to exclude things you don't want.

    my @sorted = sort grep /^[a-z](?:[^-]|-(?!-))*(?<=[a-z])$/i, @words;

    Here's the explanation:

    /^[a-z] # Start with a letter. (?: # Start grouping. [^-] # Anything other than a hyphen | # Or -(?!-) # A hyphen not followed by another hyphen )* # End group. Match 0 or more of those groups. (?<=[a-z])$ # The end, as long as it is preceded by a letter. /ix # Match case insensitively and allow comments.

    I almost posted this with a different regex:

    /^[a-z](?:[^-]|-(?!-))*[a-z]$/i
    but that one requires that the words are at least two characters (actually letters due to other constraints) long, so I changed it to a positive look-behind assertion.

    -sauoq
    "My two cents aren't worth a dime.";
    

      Y%o*u a|\r/'e of co(ur)se t!ot^&a*lly co&^><,.rre~###Ct.

      Much better:^)


      Nah! You're thinking of Simon Templar, originally played (on UKTV) by Roger Moore and later by Ian Ogilvy
Re: Sorting words, keeping only words that start with a letter and contain only letter characters, and hypens, and end in a letter
by atcroft (Abbot) on Nov 10, 2002 at 22:39 UTC

    If you have them in an array, then all it would seem you would need to do would be:

    1. filter out any words that didn't start with a letter (/^\w/)
    2. take the above array, then filter for words not ending with a letter (/\w$/)
    3. take the resultant array, and filter any words containing characters other than word characters (\w) or hyphen (! /\w(\w\-)+\w$/ maybe)
    4. take resulting array, and filter out words containing hyphen 2 or more {2,} times (! /\-{2,}/ maybe).
    I would suggest multiple uses of grep as a possibility (but that is just me-I am sure others will come up with better solutions).

    I thought up about 6 lines of code to do each step, but thought the code (untested and untried) was not so good, so I removed it from being visible....

    Update: Sounds like, if you're using grep or a match, you need to look at anchoring the match you're trying to find if it matches beginning with a hyphen by either doing /^\w/ to match beginning with a word character, or /^\-/ to match against a leading dash.

      see, i'm new to this and i don't understand how to check if a word begins with a hyphen.. i've got my sort down to the point where it just keeps words with letters and hyphens. The making sure there's a letter, hyphen then letter is my problem.
Re: Sorting words, keeping only words that start with a letter and contain only letter characters, and hypens, and end in a letter
by graff (Chancellor) on Nov 10, 2002 at 22:52 UTC
    @sorted = sort grep { /^[a-z].*[a-z]$/i and not /-{2,}/ } @orig
    The grep expression tests two separate regex matches: (1) must begin and end with a letter, (2) must not have two or more consecutive hyphens.

    update: In case you want a case-folded sort (so that "amen" can be placed between "Amen" and "Amenable") add this block between the "sort" and "grep" above (this is copied from "perldoc -f sort"):

    {uc($a) cmp uc($b)}
Re: Sorting words, keeping only words that start with a letter and contain only letter characters, and hypens, and end in a letter
by Ido (Hermit) on Nov 10, 2002 at 22:52 UTC
    Sort is simply sort, that rest is a regex.
    @newarray=sort grep /^[a-zA-Z]([a-zA-Z]*|-[a-zA-Z])*$/,@array;
    HTH
    Update: It's unclear whether you want words like 'b0n' (with characters that aren't letters in positions other than beginning and end), if you do, change the regex to:  /^[a-zA-Z]([^-]*|-[^-])*(?<=[a-zA-Z])$/
      this is my code, can i not do this? it doesn't seem to filter any words out when i do.
      foreach $word (split){ if($word =~ /^[a-zA-Z]([a-zA-Z]*|-[a-zA-Z])*$/){ $count{$word}; }
      }
        Try $count{$word}=1;....
Re: Sorting words, keeping only words that start with a letter and contain only letter characters, and hypens, and end in a letter
by Thelonius (Priest) on Nov 10, 2002 at 22:59 UTC
    @out = sort grep !/^[^a-z]|[^a-z]\Z|--/i, @in;
Re: Sorting words, keeping only words that start with a letter and contain only letter characters, and hypens, and end in a letter
by pg (Canon) on Nov 10, 2002 at 23:36 UTC
    use strict; sub do_it_for_anonymous_monk { my @words = @_; my $word; my @results; foreach $word (@words) { if (($word =~ m/^[a-z|A-Z].*[a-z|A-Z]$/) && ($word !~ /--/)) { push @result, $word; } } return @results; } my @words = ("br-ian", "bri--an", "br-i-an", "brian", "brian0", "brian +o", "2sdha"); do_it_for_anonymous_monk(@words);
Re: Sorting words, keeping only words that start with a letter and contain only letter characters, and hypens, and end in a letter
by DamnDirtyApe (Curate) on Nov 11, 2002 at 06:38 UTC

    This appears to work as requested, and it's a whole lot easier to read than some of the other options. :-)

    #! /usr/bin/perl use strict ; use warnings ; my @words = qw( brian0 brian- -brian bri--an Bria-n br-i-an brian ) ; print map { "[$_]" } @words ; print "\n" ; my @good = grep { /^[[:alpha:]]/ # Starts with a letter && /[[:alpha:]]$/ # Ends with a letter && ! /--/ # No consecutive hyphens } @words ; print map { "[$_]" } @good ; print "\n" ;

    _______________
    DamnDirtyApe
    Those who know that they are profound strive for clarity. Those who
    would like to seem profound to the crowd strive for obscurity.
                --Friedrich Nietzsche