in reply to golf anyone? (taking first field)

28 but it works.

# 1 2 #1234567890123456789012345678 map/(^\s*[^:]*[^:\s])/,@list

Specifically, remove blank lines, and truncate each line at the first : (if present) and get rid of trailing whitespace after the xxxxx part (including the possible "\n".

Of all the solutions that were submitted, only this one and blokhead's (36) took into account that the 'xxxxxx' portion might itself contain whitespace. (Well, to be fair, busunsl's did too, but it didn't properly remove newlines. Update: So did Arien's as he kindly pointed out to me but his breaks by being too liberal in what it accepts/returns.)

This is what I used to test:

#!/usr/bin/perl -w use strict; my @T = ( '1 :yyyyy blah blah', '2 : yyyyy blah blah', '3 : ', '4 :', '5:yyyyy blah blah', '6: yyyyy blah blah', '7: ', '8:', ' : yyyyy blah blah', ' :yyyyy blah blah', ': yyyyy blah blah', ':yyyyy blah blah', '9 yyyyy blah blah', '10 yyyyy blah blah', '11 ', '12', ' : ', ' :', ': ', ':', ' ', '', '13 andmore:', '14 andmore : blah', ' 15 : foo', ' 16 andmore : foo', ); @T = map {($_,$_."\n")} @T; sub by { print "--- By @_ -------------\n"; } my @list; # 1 2 + 3 # 123456789012345678901234567 +8901234567 @list=@T;by 'sauoq'; print "($_)\n" for map/(^\s*[^:]*[^:\s])/,@lis +t; @list=@T;by 'blokhead'; print "($_)\n" for map{/([^:]*?)\s*(:|$)/;$1|| +()}@list; # Broken @list=@T;by 'busuns1'; print "($_)\n" for map{s/\s*(:.*|$)//;$_||()}@ +list; @list=@T;by 'Arien'; print "($_)\n" for map/(.+?)\s*(?>:|$)/,@list; @list=@T;by 'Arien'; print "($_)\n" for map/(.+?)\b\s*(?>:|$)/,@lis +t; @list=@T;by 'Aristotle';print "($_)\n" for map/^([^:\s]+)/,@list; @list=@T;by 'Aristotle';print "($_)\n" for map/^\s*([^:\s]+)/,@list; @list=@T;by 'blokhead'; print "($_)\n" for map{/([^:]*?)(\s*\n|\s*:)/& +&$1}@list; @list=@T;by 'blokhead'; print "($_)\n" for map{/([^:]*?)\s*(\n|:)/&&$1 +}@list; @list=@T;by 'jmcnmara'; print "($_)\n" for map{(split)[0]}@list; @list=@T;by 'CountZero';print "($_)\n" for map/(.*?)\s+:/,@list;

This is the output:

--- By sauoq ------------- (1) (1) (2) (2) (3) (3) (4) (4) (5) (5) (6) (6) (7) (7) (8) (8) (9 yyyyy blah blah) (9 yyyyy blah blah) (10 yyyyy blah blah) (10 yyyyy blah blah) (11) (11) (12) (12) (13 andmore) (13 andmore) (14 andmore) (14 andmore) ( 15) ( 15) ( 16 andmore) ( 16 andmore) --- By blokhead ------------- (1) (1) (2) (2) (3) (3) (4) (4) (5) (5) (6) (6) (7) (7) (8) (8) (9 yyyyy blah blah) (9 yyyyy blah blah) (10 yyyyy blah blah) (10 yyyyy blah blah) (11) (11) (12) (12) (13 andmore) (13 andmore) (14 andmore) (14 andmore) ( 15) ( 15) ( 16 andmore) ( 16 andmore) --- By busuns1 ------------- (1) (1 ) (2) (2 ) (3) (3 ) (4) (4 ) (5) (5 ) (6) (6 ) (7) (7 ) (8) (8 ) ( ) ( ) ( ) ( ) (9 yyyyy blah blah) (9 yyyyy blah blah) (10 yyyyy blah blah) (10 yyyyy blah blah) (11) (11) (12) (12) ( ) ( ) ( ) ( ) (13 andmore) (13 andmore ) (14 andmore) (14 andmore ) ( 15) ( 15 ) ( 16 andmore) ( 16 andmore ) --- By Arien ------------- (1) (1) (2) (2) (3) (3) (4) (4) (5) (5) (6) (6) (7) (7) (8) (8) ( ) ( ) ( ) ( ) (: yyyyy blah blah) (: yyyyy blah blah) (:yyyyy blah blah) (:yyyyy blah blah) (9 yyyyy blah blah) (9 yyyyy blah blah) (10 yyyyy blah blah) (10 yyyyy blah blah) (11) (11) (12) (12) ( ) ( ) ( ) ( ) (:) (:) (:) (:) ( ) ( ) (13 andmore) (13 andmore) (14 andmore) (14 andmore) ( 15) ( 15) ( 16 andmore) ( 16 andmore) --- By Arien ------------- (1) (1) (2) (2) (3) (3) (4) (4) (5) (5) (6) (6) (7) (7) (8) (8) ( : yyyyy blah blah) ( : yyyyy blah blah) ( :yyyyy blah blah) ( :yyyyy blah blah) (: yyyyy blah blah) (: yyyyy blah blah) (:yyyyy blah blah) (:yyyyy blah blah) (9 yyyyy blah blah) (9 yyyyy blah blah) (10 yyyyy blah blah) (10 yyyyy blah blah) (11) (11) (12) (12) (13 andmore) (13 andmore) (14 andmore) (14 andmore) ( 15) ( 15) ( 16 andmore) ( 16 andmore) --- By Aristotle ------------- (1) (1) (2) (2) (3) (3) (4) (4) (5) (5) (6) (6) (7) (7) (8) (8) (9) (9) (10) (10) (11) (11) (12) (12) (13) (13) (14) (14) --- By Aristotle ------------- (1) (1) (2) (2) (3) (3) (4) (4) (5) (5) (6) (6) (7) (7) (8) (8) (9) (9) (10) (10) (11) (11) (12) (12) (13) (13) (14) (14) (15) (15) (16) (16) --- By blokhead ------------- (1) (1) (2) (2) (3) (3) (4) (4) (5) (5) (6) (6) (7) (7) (8) (8) () () () () () () () () () (9 yyyyy blah blah) () (10 yyyyy blah blah) () (11) () (12) () () () () () () () () () () () () (13 andmore) (13 andmore) (14 andmore) (14 andmore) ( 15) ( 15) ( 16 andmore) ( 16 andmore) --- By blokhead ------------- (1) (1) (2) (2) (3) (3) (4) (4) (5) (5) (6) (6) (7) (7) (8) (8) () () () () () () () () () (9 yyyyy blah blah) () (10 yyyyy blah blah) () (11) () (12) () () () () () () () () () () () () (13 andmore) (13 andmore) (14 andmore) (14 andmore) ( 15) ( 15) ( 16 andmore) ( 16 andmore) --- By jmcnmara ------------- (1) (1) (2) (2) (3) (3) (4) (4) (5:yyyyy) (5:yyyyy) (6:) (6:) (7:) (7:) (8:) (8:) (:) (:) (:yyyyy) (:yyyyy) (:) (:) (:yyyyy) (:yyyyy) (9) (9) (10) (10) (11) (11) (12) (12) (:) (:) (:) (:) (:) (:) (:) (:) (13) (13) (14) (14) (15) (15) (16) (16) --- By CountZero ------------- (1) (1) (2) (2) (3) (3) (4) (4) () () () () () () () () (14 andmore) (14 andmore) ( 15) ( 15) ( 16 andmore) ( 16 andmore)
-sauoq
"My two cents aren't worth a dime.";

Replies are listed 'Best First'.
Re: Re: golf anyone? (taking first field)
by John M. Dlugosz (Monsignor) on Jan 08, 2003 at 02:46 UTC
    Very impressive!

    It's amazing how people make their own additional constraints and then blame someone for not writing a clear specification.

    Your solution

    map/(^\s*[^:]*[^:\s])/,@list
    Is beautifully clear and simple, not just short because of fancy tricks. Read all the leading whitespace, read up to the first : or to the end, finally back off any whitespace. (actually, will read over consecutive :'s, not the first. But it's the first =occurance=.)

    In general, the x*y regex idiom, where y is a union of x and w, will take internal w but not trailing w. This triggers backtracking to literally "back off" if it happened to end in w.

    Furthermore, it works with only the most common regex features, not using rare backslash chars or extensions. It can be grocked by anyone with a little regex experience.

    The fact that the blank lines and blank-after-truncating lines are purged without special case logic means that the algorithm to "find the interesting part" matches neatly what the defining characteristic of that interesting part is. The desired behavior of blanks and empty lines is a natural concequence of that fundimental idea, not arbitrary rules designed to make it harder (like a putt-putt course's windmill?).

    Well done. I think you hit the sweet spot on that one.

    —John