jjohhn has asked for the wisdom of the Perl Monks concerning the following question:

I have about 5 files, with 10-15 fields each. I would like to identify which fields are null and how many times each null field occurs in the files.
Using a variant of
s/(\||\)/$1/
I believe I could discover how many null fields are in each line, but that will not identify which nodes are null. I looked at awk, but I didn't find a variable there to identify which field it is processing. Thank you for your help.

Replies are listed 'Best First'.
Re: identifying null fields in bar delimited records
by holli (Abbot) on May 31, 2005 at 11:21 UTC
    Something along the lines of
    my $line = ";field;;field;\t;field; ;field;field"; my @line = split ";", $line; my @null = grep { defined $_ } map { $line[$_] =~ /^$/ ? $_ : undef } +(0..$#line); print "line contains ", scalar @null, " null fields at offset(s): @nul +l"; #line contains 2 null fields at offset(s): 0 2

    ?

    Update: To fit in the OP's specified requirements:

    use strict; while ( <DATA> ) { my @line = split /\|/, $_; my @null = grep { defined $_ } map { $line[$_] =~ /^$/ ? $_ : unde +f } (0..$#line); print "line contains ", scalar @null, " null fields at offset(s): +@null\n"; } #line contains 3 null fields at offset(s): 1 3 4 #line contains 2 null fields at offset(s): 2 4 #line contains 3 null fields at offset(s): 0 1 4 #line contains 2 null fields at offset(s): 1 2 4 __DATA__ first||third|| alpha|beta||delta| ||c|d| one|||four|


    holli, /regexed monk/
      holli++

      Here's a humble for loop:

      #!/usr/bin/perl use strict; use warnings; my $line = "a||c|\t| |d||e"; my @fields = split(/\|/, $line); my @null; for my $i (0..$#fields){ push @null, $i unless $fields[$i]; } print "null fields: ", scalar @null, "\n"; print "at field: "; print ++$_, ", " for @null; # output # null fields: 2 # at field: 2, 7,
        holli and wfsp, great answers. Now, I'd like to make this snippet of code more general by turning the delimiter into a variable. But doing so doesn't work (I'm struggling to understand regular expressions).
        my $delim = '|'; my $line = "a||c|\t| |d||e"; my @fields = split(/\\$delim/, $line); # output # null fields: 0 # at field:
        I posted as anon by mistake, which means I can't edit my post to remove the superfluous line containing the variable "line".
      I ended up doing this in awk; it could probably be translated to perl by someone smarter than me. For some reason the filename is printed AFTER the results instead of before as I would have expected, but otherwise this seems to work.
      BEGIN{FS="\|";} { line= substr($0,0,length($0)-1); #peel trailing bar n=split($0,a); for (i=0;i<n;i++){ line=line a[i]; if(a[i] ==""){nulls[i]++}; } } END{ print FILENAME; for (i in nulls) print "\t field " i": " nulls[i] " nulls"; }
Re: identifying null fields in bar delimited records
by tchatzi (Acolyte) on May 31, 2005 at 11:05 UTC
    Do you mind if you show us a sample of your files?

    ``The wise man doesn't give the right answers, he poses the right questions.'' TIMTOWTDI
      They look like this:

      first||third||
      alpha|beta||delta|
      ||c|d|
      one|||four|

      I'd like to print something like:
      field1->1
      field2->2
      field3->2
      field4->1

        Build a hash
        #!/usr/bin/perl use strict; use warnings; my %null; while (<DATA>){ my @fields = split(/\|/); for my $i (0..$#fields){ $null{"field$i"}++ unless $fields[$i]; } } for my $key (sort keys %null){ print "$key->$null{$key}\n"; } __DATA__ first||third|| alpha|beta||delta| ||c|d| one|||four|
        output:
        field0->1 field1->3 field2->2 field3->1
        The field numbers start at zero but also the second field has 3 compared to 2 in your desired ouput.

        The third line of your data starts with two bars, isn't that 2 null fields?

        How does "first||third||" fit together with "field1->1"? There are 3 null fields in that line.


        holli, /regexed monk/
Re: identifying null fields in bar delimited records
by ghenry (Vicar) on May 31, 2005 at 11:04 UTC

    Update: Misinterpretated the OP last comments.

    I think you should consider looking for tabs and whitesapce on each line too.

    Maybe I'm wrong, but if you could provide some sample data, then I'm sure we can help.

    Walking the road to enlightenment... I found a penguin and a camel on the way.....
    Fancy a yourname@perl.me.uk? Just ask!!!