Dear fellow Monks,
the standard *nix command strings returns a list of printable strings from an input file.
I want to make an enhanced version of that command, returning only the words (not strings) that appear more than once.

I came up with this code:
#!/usr/bin/perl -n END{print map{$s{$_}>1?"$s{$_}\t$_$/":""}keys%s}$s{$1}++while/(\w{4,}\ +b)/g
76 characters. Not bad for a short script. Too long if I want to make it a one-liner.
I gave myself a few rules:
The above script, executed as
perl strings.pl `which perl` | sort -rn
will return:
15 DynaLoader 6 Usage 4 UWVS 2 dl_unload_file 2 dl_undef_symbols 2 version 2 dl_install_xsub 2 GLIBC_2 2 boot_DynaLoader 2 dl_find_symbol 2 filename 2 linux 2 dl_error 2 dl_load_file
Any suggestions to shorten it?

TIA

update Please notice that the output of 'strings' is a stream of printable characters (could be a word or several space-separated words), while I am interested in getting the unique words only.
 _  _ _  _  
(_|| | |(_|><
 _|   

Replies are listed 'Best First'.
Re: enhanced 'strings' command
by jmcnamara (Monsignor) on May 03, 2002 at 10:50 UTC

    I can't see a way to make it dramatically shorter but here is an alternative.     strings `which perl` | sort | uniq -c | sort -rn | awk '$1>1'

    The output isn't exactly as you requested but it is close. To include tabs instead of spaces you could modify the *cough* awk program.

    Update: gmax is looking for words rather than the entire string therefore something like the following is required after the first pipe (at which point you might as well do the whole thing in Perl):

    perl -lane 'BEGIN{$,="\n"} print @F'

    --
    John.

Re: enhanced 'strings' command
by japhy (Canon) on May 03, 2002 at 14:46 UTC
    Here's a shot:
    #!/usr/bin/perl -n $s{$&}++while/\w{4,}/g}{print map"$s{$_}\t$_\n"x($s{$_}>1),keys%s

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a (from-home) job
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;