jhs3 has asked for the wisdom of the Perl Monks concerning the following question:

I am such a newly hatched user of Perl, I still have eggshells on my pants. This is my first question for perlmonks, and I hope I can find an easy answer.
I've been writing shell scripts and can get around there pretty well. I am trying to rewrite some of the shell scripts into Perl so I can use them both on Unix and Windows.
Working with multiple environments with multiple databases, I have a set of files that tells me stuff about the environment and the databases within the environment. An example: Environment is FOO. FOO has a number of databases. I have a text file called FOO.dbs to tell me about the databases in FOO. Each line of the FOO.dbs file is a pipe-delimited string that tells me the dbname, the host it runs on, the directory the db is on, the directory to do dump&load to, etc. An example of a line in the FOO.dbs would be:
bar|development|/usr/bar/db|/dbdump
Inside a script (shell script or perl) I read each line and pass the line to another script that parses the line and sets up variables for me to use. The subscript (parse-dbinfo) in Unix uses the 'cut' command to return the fields like this:
DB=`echo $1 | cut -f 1 -d "|"` HOST=`echo $1 | cut -f 2 -d "|"` DBDIR=`echo $1 | cut -f 3 -d "|"` DUMPDIR=`echo $1 | cut -f 4 -d "|"`
With this information, my shell script can work with the variables to get work done. I can refer to a specific database by '$DBDIR/$DB' to get '/usr/bar/db/bar'.
How can I duplicate the effect of the 'cut' to load the variables in PERL?

Replies are listed 'Best First'.
Re: Parse a pipe-delimited list
by ptum (Priest) on Sep 20, 2006 at 16:41 UTC

    To parse a line of data like this, one way to do it is to use split. You would create an array from the output of split as applied to your line of data with the appropriate delimiter. The array would contain each of your successive tokens.

    Update: (added code example)

    #!/usr/local/bin/perl use strict; use warnings; my $config = 'bar|development|/usr/bar/db|/dbdump'; my @tokens = split /\|/,$config; foreach (@tokens) { print "Token: $_\n"; }

    No good deed goes unpunished. -- (attributed to) Oscar Wilde
Re: Parse a pipe-delimited list
by EvanCarroll (Chaplain) on Sep 20, 2006 at 16:43 UTC
    I prefer to use DBD::CSV and set "csv_sep_char=\|;"


    Evan Carroll
    www.EvanCarroll.com
      Unless you can't compile XS, I would recommend Text::CSV_XS or Text::CSV::Simple (which uses Text::CSV_XS) you may not need the performance now but, I find I use it enough that you'll run into those situations.

      Also Text::xSV looks good

      These modules get around the problems with using split. The big problem being, escaped seperators in the data. Unless you are dead certain that the data has been cleaned of |'s avoid split.

      Even if you can get by with split and don't really need Text::CSV_XS now you'll be better off learning the module. This type of thing comes up a lot and split can really bite you.



      grep
      Mynd you, mønk bites Kan be pretti nasti...
Re: Parse a pipe-delimited list
by rodion (Chaplain) on Sep 20, 2006 at 17:51 UTC
    If you're just translating shell scripts, you shouldn't have any need to escape the pipes in the data. In that case plum's use of split should work fine and it will keep the code local and minimize dependencies, making it easier to wrap your mind around while you're learning perl.

    The other question is where you want to put the data that you parse. You can put the values in individual variables, in a %config hash, or you can put them in the %ENV hash, which is a copy of the shell's environment. Below are examples of each. I included the use of a hash slice. It's beyond what you should already understand as a newbie, but it's not that far of a stretch, if you're in the mood for it, and it has a certain conciceness.

    use strict; use warnings; my $str = 'bar|development|/usr/bar/db|/dbdump'; # put the values into individual variables; my ($db,$host,$dbdir,$dumpdir); ($db,$host,$dbdir,$dumpdir) = split '\|',$str; # put the values into a hash my %conf; ( $conf{'DB'}, $conf{'HOST'}, $conf{'DBDIR'}, $conf{'DUMPDIR'} ) = split '\|',$str; # print out the hash with for (keys %conf) { print "$_ => $conf{$_}\n"; } # Use a hash slice. The %conf hash is given # a list of keys. The "=" assigns to each of them in turn @conf{('DB','HOST','DBDIR','DUMPDIR)} = split '\|',$str; # put the values into the %ENV hash, where the other # environment variables are @ENV{qw(DB HOST DBDIR DUMPDIR)} = split '\|',$str;