Re: Re: VarStructor II -- Abbreviation tool

Replies are listed 'Best First'.
3Re: VarStructor II -- Abbreviation tool by jeffa (Bishop) on Jun 01, 2004 at 15:31 UTC
No. There is every need to speculate with your code, Wassercrats. My code was not meant to replace yours, or even reproduce 100% of the functionality. It was to show you that you can do what you are trying to do with much less code! How about this instead: it truncates the string to the first two letters (lowercased) and adds incremented numbers for each collision. And this time i won't require you to use a module! `my (%symbol,@old,@new); while (<DATA>) { chomp($_ = substr lc($_),0,2); push @old, $_; $symbol{$_}++; push @new, $_ . $symbol{$_}; } print "$_\n" for @new; __DATA__ Line one Line two xxxxxxxxxxx Another line xxxxxxxxxxx Lines end here not xxxxxxxxxxx Fourth line (used to be) Line five Lines end here Lines end here too` [download] Of course, while i find this to be quite silly -- it is more intuitive than what you have. Besides, you already have your arguable better way. jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Re: Re: VarStructor II -- Abbreviation tool by jdporter (Paladin) on Jun 02, 2004 at 14:42 UTC
It's not clear to me why you have the lines `Line one Line two Line five` [download] reduce to `L1 L2 L3` [download] but the lines `Lines end here not Lines end here Lines end here too` [download] reduce to `Line_o Line_t Line_f` [download] I mean, I understand that the latter are treated differently because they're identical within the `Max_Length` span, whereas the shorter ones aren't. But still... why should that matter? Wouldn't it have been acceptable -- aye, preferable -- to have the latter reduce to (e.g.) `L4 L5 L6` [download] ??? Thanks, jdporter	[reply] [d/l] [select]
Re: Re: Re: Re: VarStructor II -- Abbreviation tool by Wassercrats (Initiate) on Jun 02, 2004 at 15:17 UTC
You got the output example backwards. Line one Line two Line five reduces to: Line_o Line_t Line_f and: Lines end here not Lines end here Lines end here too reduces to: L1 L2 L3 The script is intended to create the shortest possible unique string, with a given maximum number of characters from the beginning of the string. When two strings are identical, a number must be appended and since the text is of no use in distinguishing those strings, and the shortest possible unique string contains just one (actually, zero) characters, that's how many characters are kept. Whether sticking to that produces the most useful function is another issue, but it does produce the shortest string and puts value on the inclusion of text when the text isn't identical. I will be adding an option to have identical text that's appended with a number contain the maximum number of characters instead of just one.	[reply]
Re: Re: Re: Re: Re: VarStructor II -- Abbreviation tool by jdporter (Paladin) on Jun 02, 2004 at 15:51 UTC
You got the output example backwards. So I did. Ack! But you got the jist of my question. Whether sticking to that produces the most useful function is another issue, but it does produce the shortest string and puts value on the inclusion of text when the text isn't identical. Ah. Interesting. It sounds like there is essentially a trade-off to be made, between shortest possible and inclusion of meaningful text. I guess I would tend to weight the latter more heavily than you did. Either that, or not at all, which is (I believe) what Text::Abbrev does. Here's my solution which gives absolute priority to inclusion of meaningful text. It doesn't maximize compression of strings such as `xxxxxxxxxx` the way yours does, but I'm viewing those as unlikely input. For "meaningful" text such as `Four score...`, the results are satisfactory (IMHO). my @values = <DATA>; print "\nValues:\n", @values; chomp @values; sub max_len { my $len = -1; for ( @_ ) { $len < length $_ and $len = length $_; } $len } sub min { $_[0] < $_[1] ? $_[0] : $_[1] } my $max_length = min( 10, max_len(@values) ); # normally, we only want to allow valid ID characters: my @idents = map { s/\W+/_/g; $_ } @values; print "\nIdentifiers (raw):\n", map "$_\n", @idents; print "\n\n\n\n"; my %mapping; # contains the final result { # this is the part that does the real work: my %h; for ( @idents ) { push @{ $h{ substr $_, 0, $max_length } }, $_; } for ( keys %h ) { my $n = @{ $h{$_} }; my $n_width = length $n; if ( $n == 1 ) { $mapping{$_} = $h{$_}[0]; } else { my $prefix = substr $_, 0, $max_length - $n_width; my $oneup; for ( @{ $h{$_} } ) { my $k; do { $k = $prefix . sprintf '%0'.$n_width.'d', ++$oneup; } while exists $mapping{$k}; $mapping{$k} = $_; } } } } # end of main algorithm. for ( sort keys %mapping ) { print "$_ => '$mapping{$_}'\n"; } __DATA__ line one line two lines end here not lines end here too yyyyyyy yyyyyyy xxxxxxxxxx xxxxxxxxxx zzzzzzzzzzzzz zzzzzzzzzzzzz r r rr rr r1 r1 r12 r12 Four score and seven years ago, our fathers... Four score and seven years ago, our mothers... Four score and seven years ago, our brothers... [download]	[reply] [d/l] [select]