rendler has asked for the wisdom of the Perl Monks concerning the following question:

I've been working on a script in the past couple of days thats news boxes of other site. I've put the script in my stratchpad. So now I'm trying to improve the code and make it less spaghetti like. First of all I'd like to improve the config file parsing. I've been trying to redo it and so far I have
my $sites_conf = "$ENV{'HOME'}/.newsboxes/sites.conf"; my %sites_config = parse_sites_config(); sub parse_sites_config { open SITES_CFG, "< $sites_conf" or die "Couldn't open $sites_conf: + $!\n"; my %options; while(<SITES_CFG>) { chomp; s/\/\/\s+.*//g; # Get rid of comments; next unless length; if (my($key, $operator, $value) = /^\s*?(\S+)\s*?(\=\>?)\s*?(\S+)\ +s*$/) { if ($operator eq "=>") { ++$options{'total_sites'}; $options{$options{'total_sites'}}{'site_name'} = $key; $options{$options{'total_sites'}}{'site_url'} = $value; } elsif ($operator eq "=") { $options{$options{'total_sites'}}{$key} = $value; } else { die "Ooops you might want to check $sites_conf for errors for +site: $options{$options{'total_sites'}}{'site_name'}\n"; } } } return %options; }
Also the sites entry in the config file usually looks like this
Kuro5hin => http://www.kuro5hin.org key = k5 story_tag = item title_tag = title link_tag = link site_xml = http://www.kuro5hin.org/backend.rdf refresh = 1800 colour_1 = #302BA2 colour_2 = #FFFFFF colour_3 = #E0E0E0 colour_4 = #302BA2
But sometimes such as for "Linux Today" it has the space in the site name and because I'm using \S+ to match that it's not working for those sites that have space(s) in their names. So I was wondering what I should be using instead. Thanks.

Replies are listed 'Best First'.
Re: A little config parsing.
by perrin (Chancellor) on Jan 11, 2002 at 08:59 UTC
    Why not just make the config in Perl code and require it?
    my %conf = ( Kuro5hin => { url => 'http://www.kuro5hin.org', key => 'k5', story_tag => 'item', } );
    Or use Data::Denter? Or one of the millions of config parsing modules on CPAN?

      That's a good idea, never thought of it ;)

      As for using a module, I just needed something simple without needing too many lines to work it or use anything external.

      I second the suggestion of using a prewritten parser. I like Config::IniFiles which parses INI format files like this:

      [Linux Today] url = http://linuxtoday.org story_tag = item title_tag = title link_tag = link refresh = 1800
      If you really can't use an external module, this format is very simple to parse yourself. It's also easy to understand, and probably familiar to the person following you...

      Have fun,
      Carl Forde

Re: A little config parsing.
by Chmrr (Vicar) on Jan 11, 2002 at 06:37 UTC

    I think split is your friend in this case. Perhaps something along the lines of my ($key,$operator,$value) = split /\s*(=>?)\s*/,$_,2; will set you in the right direction. The rest is just trimming leading and trailing whitespace.

    Update: In the interests of TIMTOWDI, I do believe that one can modify your above regex to the following: /(\S.*?)\s*(\=\>?)\s*(.+?)\s*/ However, due to the amount of backtracking involved in the regex, along with the general ugliness of it, I'd personally use split in this case. In my mind, it also better captures the essence of what you're trying to do -- you've got these two things, and they'er seperated by something. You want to split them apart, so..

    perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'

      Thanks that worked great, but then I went to test the results with
      for my $i (1 .. $sites_config{'total_sites'}) { print "$sites_config{$i}{'site_name'}\n"; }
      What I don't get is how $i is getting auto incremented.

        $i is the loop variable for your for loop. Perl sets it to every value in the list that the loop is iterating over -- in this case, the list of numbers from 1 to $sites_config{'total_sites'}.

        As an aside, any particular reason you're using a hash and not an array? It seems like you're mixing metaphors a bit to have </code>$site_config{'some_string'}</code> be at the same level as $site_config{$some_site_number}.

        perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'

Re: A little config parsing.
by blokhead (Monsignor) on Jan 11, 2002 at 08:17 UTC
    (my($key, $operator, $value) = /^\s*?(\S+)\s*?(\=\>?)\s*?(\S+)\s*$/)


    as long as the site names don't have an equals sign in them, i don't see why this wouldn't work:

    (my($key, $operator, $value) = /^\s*?([^=]+?)\s*?(\=\>?)\s*?(\S+)\s*$/ +)


    I changed the \s+ to [^=]+?