agaffney has asked for the wisdom of the Perl Monks concerning the following question:

As part of a program I'm writing, I need to read some variables from a bash script and do some bash-style interpolation. My current code:

sub get_depend { my $ebuildfname = shift; my $ebuildcontents; my %ebuildvars; my $pkgname = $ebuildfname; $pkgname =~ s|/usr/portage/||; $pkgname =~ s|(.+)/.+/(.+).ebuild|$1/$2|; my $pkg = parse_package_name($pkgname); $pkg->{version} =~ s/^-//; $ebuildvars{PV} = "$pkg->{version}"; open EBUILD, "< $ebuildfname" or die "Couldn't open '$ebuildfname'\n +"; while(<EBUILD>) { $ebuildcontents .= $_; } close EBUILD; while($ebuildcontents =~ /\b([-A-Z0-9_]+)=\"(.*?)\"{1}?/sgc) { $ebuildvars{$1} = $2; } foreach(keys %ebuildvars) { $ebuildvars{$_} =~ s/\$\{?([-A-Z0-9_]+)\}?/$ebuildvars{$1}/gs; } my $depend = $ebuildvars{'DEPEND'} || ''; $depend =~ s/(\s+|\n+)/ /gs; return $depend; }

This one-pass interpolation works somewhat, but it doesn't get all the variables (for example: VAR1="something $VAR2" VAR2="test $VAR3" VAR3="anything $VAR4" would give me VAR1="something test $VAR3" VAR2="test $VAR3" VAR3="anything $VAR4"). How can I make this work? If I need to, I can pass it through bash. I just don't know how (with the ability to get the values back).

Replies are listed 'Best First'.
Re: simulating bash
by japhy (Canon) on May 13, 2004 at 14:59 UTC
    I saw your post on the perl-beginners list, but deleted it, so I was hoping someone else would reply and I could work off that, so I'm glad you posted it here, too.

    Here's my take:

    # parse variable assignments out of the file while ($ebuildcontents =~ /\b([-A-Z0-9_]+)=\"(.*?)\"{1}?/sgc) { $ebuildvars{$1} = $2; }
    I'm curious about your \"{1}? part there. Are quotes optional? I'd need to see how the config file is formatted, to help you extract these variables appropriately. But let's move onto the troublesome part:
    foreach (keys %ebuildvars) { $ebuildvars{$_} =~ s/\$\{?([-A-Z0-9_]+)\}?/$ebuildvars{$1}/gs; }
    The problem is that if VAR1 = "say $VAR2" and VAR2 = "hi $VAR3" and VAR3 = "jeff", you want to be able to expand ALL of these, so you get VAR1 = "say hi jeff", VAR2 = "hi jeff", and VAR3 = "jeff", right?

    If so, let me know, and I'll help you write a solution. It's kind of related to a dependency tree...

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
      Well, the '\"{1}?' part was my attempt to make the match non-greedy. Although, I think the '(.+?)' right before it accomplishes that. I just never removed it.

      Quote:
      The problem is that if VAR1 = "say $VAR2" and VAR2 = "hi $VAR3"
      and VAR3 = "jeff", you want to be able to expand ALL of these,
      so you get VAR1 = "say hi jeff", VAR2 = "hi jeff", and
      VAR3 = "jeff", right?
      

      That's exactly what I want to do.
        Ok, then. Here's a code block that demonstrates how to do it by using a dependency structure:
        #!/usr/bin/perl use strict; use warnings; my %vars; my %depend; while (<DATA>) { chomp; if (/^\s*(\w+)\s*=\s*(?:"([^"]*)"|(.*))/) { my ($name, $val) = ($1, $+); $vars{$name} = $val; $depend{$name} = [ $vars{$name} =~ /\$(\w+)/g ]; } } expand(\%vars, \%depend); sub expand { my ($varhash, $dephash, $queue) = @_; $queue ||= [ keys %$varhash ]; # for each item in the queue for (@$queue) { # make sure its dependencies are expanded expand($varhash, $dephash, $dephash->{$_}); # then interpolate the variables in this item $varhash->{$_} =~ s/\$(\w+)/$varhash->{$1}/g; } } use Data::Dumper; print Dumper(\%vars); __DATA__ VAR1 = "say $VAR2" VAR2 = "hi $VAR3" VAR3 = jeff
        You can see I parse my variables a little differently. If it's more complex for you, then so be it, but I used simple cases. I used the $+ variable because I want to match the string INSIDE quotes (not the quotes themselves) or else an un-quoted string, and I don't know which of these matches, so $+ gives me the last matching capture group. It's like writing defined($2) ? $2 : $3 in this case, but I didn't want to do that.

        You can see how expand() looks similar to the post-order traversal code I showed you a day or two ago. It's a similar concept. The function, when called with only the two hashrefs, uses the keys of the %vars hash as its queue. Each recursive call uses the dependent variables of the current one as its queue. I hope it's understandable.

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
Re: simulating bash
by TomDLux (Vicar) on May 13, 2004 at 17:41 UTC
    # Detect NAME=Value in shell script. # my $findShellVar = qr/\b([-A-Z0-9_]+)=\"(.*?)\"{1}?/sgc; # Detect embedded use of vars: NAME="text $VAR" # my $embeddedVar = qr/$\{?([-_[:alnum:]]+)\}?/gs; # For each line of the shell script, find shell variable # assignments, substituting the assigned value for any use # of other variables, and store result in hash. # while my $line ( <EBUILD> ) { if ( $line =~ /$findShellVar/ ) { my ( $k, $v ) = ( $1, $2 ); $v =~ s/$embeddedVar/$vars{$1}/; $vars{$k} = $v; } }

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

      That does the same thing as my original code. It doesn't work with my example.
Re: simulating bash
by graff (Chancellor) on May 15, 2004 at 02:35 UTC
    If I need to, I can pass it through bash. I just don't know how (with the ability to get the values back).

    You have sparked a fascinating discussion, making me wonder whether "only bash can parse a bash script"...

    Meanwhile, you could get bash to do the work for you, I think, if you try the following steps:

    • Make sure "#!/bin/bash" is at the top of your bash script file (or whatever the appropriate path is for bash on your system).
    • Make sure that the script (or your current shell environment) has PATH set to include the "env" command (this is /usr/bin/env on my BSD-like macosx system), and append a line at the end of the script that is just "env" (i.e. to have the script run that command after it has done everything else).
    • Make the script file executable (chmod +x ...)
    • Run a perl one-liner like the following.
    perl -e '$env = `your_bash_script.file`; print $env,$/'
    Maybe you want to redirect the output of that to a file. Note that the output will include everything in your current shell environment, along with everything declared by the bash script. If you also run a one-liner like this:
    perl -e '$env = `bash -c env`; print $env,$/'
    and compare that output to the one involving your bash script, you'll be able to "subtract out" the "ambient environment", and isolate the stuff that comes from the script.

    There might be other ways to do the same thing, but this was the first one I found (and I'll confess it took several iterations to figure it out).

    UPDATE: Just putting "env" at the end entails a simplistic assumption that the bash script will execute more or less linearly from top to bottom. But if there are multiple points in the script where processing could end under different conditions (and especially if more than one of these exit points qualifies as "success"), you would need to add the "env" command at every such point. Good luck with that...

      That's very interesting. I tried to do something like that, but I couldn't figure out how to get it to only return what variables were added from the script run.
        Right. If you wanted to do it all in one script, something like the following would probably be the cleanest (untested):
        # get the "output" environment from the script into a hash my %envhash = map { chomp; split( /=/, $_, 2 ) } `bash.script`; # now get the "ambient environment" and delete that from the hash: for ( `bash -c env` ) { ( my $var, undef ) = split( /=/ ); delete $envhash{$var}; }