mhearse has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse a file for strings which start with a dollar sign. I want to push those into an array. I then want to print only the unique values from that array. Why isn't this working?
#!/bin/perl while (<>) { chomp; $_ =~ (/\b^\$\b/); push @in, $1; } $prev = 'nonesuch'; @iner = grep($_ ne $prev && ($prev = $_), @in); foreach $value (@iner) { print $value,"\n"; }

Replies are listed 'Best First'.
Re: parse for string, then print unique
by welchavw (Pilgrim) on Dec 16, 2003 at 04:49 UTC

    It isn't working for a multiplicity of reasons. First off, your regex is capturing nothing in $1, so you shouldn't be using that var. In same regex, your ^ is misused - ^ should be placed so that it anchors the front of the line - it cannot be preceeded by \b and be sanely used. Your push of @in is unconditional, which is also wrong - you only want to push if the regex matches. Further, you should be using a hash to collect uniq values, not an array. I don't even begin to understand what that grep is trying to do.

    This code should be a good start...

    #!/usr/bin/perl use Data::Dumper; while (<DATA>) { $in{$1} = 1 if ($_ =~ (/^\s*(\$\w+.*)/)); } @a = keys %in; print Dumper \@a; __DATA__ $first = 1; $first = 1; $second = 2; #some comment # $third = 3;

    Please also read this link Perl Idioms Explained - keys %{{map{$_=>1}@list}}.

Re: parse for string, then print unique
by Zaxo (Archbishop) on Dec 16, 2003 at 04:56 UTC

    You're not getting $1 set because there is no capture done in the match (misplaced parens). If there was, you'd still be misusing $1, since it will just repeat from the previous match if the current try fails.

    By "strings which start with a dollar sign", what do you mean? What ends the strings you want? I'll assume you want to pick out perl variables of the ordinary kind (no punctuation vars or other symbol table stuff).

    my @in; while (<>) { push @in, $1 if /(\$[A-Za-z_]\w*)/; }
    That only catches the first instance in a line. If you want to catch them all there is a neater notation,
    while (<>) { push @in, /(\$[A-Za-z_]\w*)/g; }
    which takes advantage of placing the match in list context.

    Once you have your array, uniqueness is gotten by the usual trick:

    my %hsh; @hsh{@in} = (); print $_, $/ for keys %hsh;

    Update: Seeing welchavw's way of doing it all at once, I'd rewrite as

    my %hsh; @hsh{ /(\$[A-Za-z_]\w*)/g } = () while <>; delete $hsh{''}; # tidy up print $_, $/ for keys %hsh;
    Note that using /g is not a prerequisite of placing the match directly in list context to get the values. The push @in, $1 if /(\$[A-Za-z_]\w*)/; line could have been written, push @in, /(\$[A-Za-z_]\w*)/; and pushing an empty list does not grow the array.

    After Compline,
    Zaxo

Re: parse for string, then print unique
by duff (Parson) on Dec 16, 2003 at 04:46 UTC

    For one, your parentheses are outside of the regular expression so you aren't really capturing anything. For two, I don't think your pattern match is quite right anyway. What does "strings that start with dollar sign" mean? Do you want only those lines that begin with dollar signs or do you want any dollar sign followed by a sequence of alphanumerics? The former would look more like /^$(.*)/ and the latter would look like /($\w+)/. The ^ in your pattern confuses me. Also, since you want the unique entries, you should stick them in a hash. One of the defining properties of a hash is that the keys are unique. So something like ...

    while (<>) { chomp; $hash{$1}++ if /($\w+)/; } print map { "$_\n" } keys %hash;

    But again, I'm still not sure what you're trying to match really

      Slight change to my post. I need to match any alphanumeric string which contains a dollar sign. $ may be anywhere in the string. I would like to grab the entire value, including the $. An example value is "$LEGEND2" (quotes are part of string).
        You mean something like this?
        use strict; use warnings; use Data::Dumper; my $str = '"$a $a $a" ab bc "c$d e$f$" c$d "$a $a $a"'; my @uniq = keys %{{map {$_ => 1}($str=~m/("[^"]*\$[^"]*")/g)}}; print Dumper(\@uniq);
        And the output is -
        $VAR1 = [ '"c$d e$f$"', '"$a $a $a"' ];
        Note that I kept the regex simple by not trying to escape the string in the example above. If you want to include escaped strings, the regex becomes more complicated.
        ... my $str = '"$a $a $a" ab bc "c$d \"e$f$" c$d "$a $a $a"'; my @uniq = keys %{{map {$_ => 1} ($str=~m/("(?:\\"|.)*?\$(?:\\"|.)*?")/g)}}; ...
        And the new output -
        $VAR1 = [ '"c$d \\"e$f$"', '"$a $a $a"' ];
        Since dollar signs are neither alpha's, nor numeric characters, what are alphanumeric strings which contain a dollar sign? Are "$$LEGEND2", "L$E$G$E$N$D$2", "LEG!$!END2" valid strings?

        Abigail

        If the quotes are part of the string, what else would you allow to be part of the string? Is more than one $ ok? Based on the ^ in your original try, is what you really want entire lines of the file? If so, what lines don't you want?

        I note that your original posting kept the strings in original order (by filtering out duplicates assuming they were in sorted order). If that is what you want, any of the solutions involving keys aren't going to do what you want. A hash approach will still work, but goes like this:

        my %seen; @out = map !$seen{$_}++, @in;
        or just keep your original grep (unless the strings aren't in sorted order).