drewk has asked for the wisdom of the Perl Monks concerning the following question:

I am using Perl to find all the PowerPC apps on my newly upgraded Snow Leopard laptop. (Love Perl 5.10 on Snow Leopard btw...)

system_profiler SPApplicationsDataType produces a list of all the applications on OS X. There is an XML output option, but I am working with the text version. system_profiler produces output in text like this:


SystemUIServer: Version: 1.5.5 Last Modified: 8/23/09 8:22 PM Kind: Universal Get Info String: SystemUIServer version 1.5.5, Copyright 2000-20 +09 Apple Computer, Inc. Location: /System/Library/CoreServices/SystemUIServer.app UserNotificationCenter: Version: 3.0.0 Last Modified: 1/29/07 11:03 PM Kind: Universal Location: /System/Library/CoreServices/UserNotificationCenter.ap +p
My perl script to parse this is:
#!/usr/bin/perl $apps_rep = `system_profiler SPApplicationsDataType 2> /dev/null`; @apps_lines = split(/\n/,$apps_rep) ; @apps=(); $count = @apps_lines ; $i=$j=$k=$p=0; while ($j<$count) { $apps[$i] .= $apps_lines[$j] ; $apps[$i] .= "\n" ; $i++ if ($apps_lines[$j]) =~ /^\s\s\s\s\S.*:$/; $j++; } print "$i apps\n" ; while($k<$i) { $_ = $apps[$k++] ; if (/Kind: PowerPC/s) {print; $p++} ; } print "$i applications, $p PowerPC applications\n\n";

Each application record is delimited by four spaces, the app name, and a colon at the end of the line. The regex /^\s\s\s\s\S.*:$/ captures this delimiter, but I cannot use the regex in split(/^\s\s\s\s\S.*:$/,$apps_rep);. Instead I have to read the output by lines, reassemble the lines into an array, and match /Kind: PowerPC/s on the resulting record.

Any wisdom on why I cannot use the split(/^\s\s\s\s\S.*:$/,$apps_rep);. call? The regex works, but not in split? Any better way to do this?

Replies are listed 'Best First'.
Re: Is this the best regex?
by almut (Canon) on Sep 08, 2009 at 22:18 UTC
    The regex /^\s\s\s\s\S.*:$/ captures this delimiter, but I cannot use the regex in split(/^\s\s\s\s\S.*:$/,$apps_rep);

    Try  split(/^\s\s\s\s\S.*:$/m,$apps_rep);   (note the /m — it makes ^ and $ match the start/end of lines within the string)

    You probably also want to put capturing parentheses in the pattern — /^\s\s\s\s(\S.*):$/m — in order to keep what was split on, i.e. you'd then get alternating entries for record title (app name) and record body.

      Alternative that splits the records without doing additional parsing:
      split(/^(?=\s\s\s\s\S.*:$)/m, $apps_rep);

      Yes -- that fixes split, but my thinking was fuzzy. I got focused on why split did not work and forgot that if I did get it working, the delimiter would be thrown away. In the case, the delimiter is data because it has the program name!

      Thanks for the education on split. I am sure it will come in handy one day.

        Wait. I saw your comment re the capturing parentheses. How do I capture both the fields and the delimiters?
Re: Is this the best regex?
by jwkrahn (Abbot) on Sep 09, 2009 at 01:54 UTC

    Perhaps this will work better for you:

    #!/usr/bin/perl use warnings; use strict; open my $PIPE, '-|', 'system_profiler SPApplicationsDataType 2> /dev/n +ull' or die "Cannot open pipe from 'system_profiler' $!"; my ( $i, $p ); while ( <$PIPE> ) { $i += /\A\s{4}\S.*:\Z/; $p += /\A\s+Kind: PowerPC\Z/; } close $PIPE or warn $! ? "Error closing 'system_profiler' pipe: $!" : "Exit status $? from 'system_profiler'"; print "$i applications, $p PowerPC applications\n\n";

      Yes -- better!

      It is faster and seems easier to understand. I changed it to:

      #!/usr/bin/perl use warnings; use strict; open my $PIPE, '-|', 'system_profiler SPApplicationsDataType 2> /dev/n +ull' or die "Cannot open pipe from 'system_profiler' $!"; my ( $i, $p, @apps ); while ( <$PIPE> ) { $apps[$i] .= $_; $i += /\A\s{4}\S.*:\Z/; $p += /\A\s+Kind: PowerPC\Z/; } close $PIPE or warn $! ? "Error closing 'system_profiler' pipe: $!" : "Exit status $? from 'system_profiler'"; foreach (@apps) { print if /Kind: PowerPC/s; } print "$i applications, $p PowerPC applications\n\n";
      Question: is there a difference between /\A\s{4}\S.*:\Z/ and /^\s{4}\S.*:$/?
Re: Is this the best regex?
by halfcountplus (Hermit) on Sep 08, 2009 at 22:27 UTC
    Well, a big issue is that your regex begins with ^ and ends with $, which is kind of oxymoronic for a split. If you remove the $, it will probably work, unless you were intending what almut recommends, in which case you could also use '\n' instead of $.

    I'm not sure what you are trying to do, but I would encourage you to go for a more complex data structure here, namely an array of hashes. That means a function which processes each app and returns a hash of its contents. If you want an example, just ask.
      I don't understand. Aren't most split delimiters either a \n or \s?
Re: Is this the best regex?
by toolic (Bishop) on Sep 08, 2009 at 23:24 UTC
    There is an XML output option
    There is a good chance that the XML output would be easier to parse than the plain text output that you have shown since there are many great XML parsers available from CPAN (for example, XML::Twig). If you post a small sample of the XML output, I could try to give you some sample code.