comment on

Hello,

I am trying to parse a text file, which has a certain structure, i would say its recursive; I tried to parse it using an iterative approach, going over each line and trying to figure out what action to take based on the current line and some state variables that I maintain/update as my program crunches over the lines. The problem I have is that I cannot figure out the correct logic to parse the input to the output format I desire. Could someone help me figure out the logic ( an algorithm, perhaps ) for parsing this structured text file ? I am not necessarily asking for code, just maybe a text description or ideas how to do it. I will attach my code ( which is not producing a correct output ), and try to explain it a few steps how it operates.

A sample input:

max_reformation_duration=300
cfs_online_timeout_changes=200
max_configured_packages=100
node:nodeA|name=nodeA
node:nodeA|status=up
node:nodeA|interface:lan900|name=lan900
node:nodeA|interface:lan900|status=up
node:nodeA|interface:lan900|ip_address:IP_A|name=IP_A
node:nodeA|interface:lan900|ip_address:IP_A|netmask=255.255.255.0
node:nodeA|interface:lan901|name=lan901
node:nodeA|interface:lan901|status=up
node:nodeA|interface:lan901|ip_address:IP_C|name=IP_C
node:nodeA|interface:lan901|ip_address:IP_C|netmask=255.255.255.0
node:nodeA|subnet:SUBNET_A|name=SUBNET_A
node:nodeA|subnet:SUBNET_A|status=up
node:nodeA|subnet:SUBNET_B|name=SUBNET_B
node:nodeA|subnet:SUBNET_B|status=up
node:nodeB|name=nodeB
node:nodeB|status=up
node:nodeB|interface:lan900|name=lan900
node:nodeB|interface:lan900|status=up
node:nodeB|interface:lan900|ip_address:IP_B|name=IP_B
node:nodeB|interface:lan900|ip_address:IP_B|netmask=255.255.255.0
node:nodeB|interface:lan901|name=lan901
node:nodeB|interface:lan901|status=up
node:nodeB|interface:lan901|ip_address:IP_D|name=IP_D
node:nodeB|interface:lan901|ip_address:IP_D|netmask=255.255.255.0
node:nodeB|subnet:SUBNET_A|name=SUBNET_A
node:nodeB|subnet:SUBNET_A|status=up
node:nodeB|subnet:SUBNET_B|name=SUBNET_B
node:nodeB|subnet:SUBNET_B|status=up
[download]

Parsed, it should look like this:

%config = {
    max_reformation_duration => 300,
    cfs_online_timeout_changes => 200,
    max_configured_packages => 100,
    node => {
        nodeA => {
            name => nodeA,
            status => up,
            interface => {
                lan900 => {
                    name => lan900,
                    status => up,
                    ip_adress => {
                        IP_A => {
                            name => IP_A,
                            netmask => 255.255.255.0,
                        },
                    },
                },
                lan901 => {
                    name => lan901,
                    status => up,
                    ip_adress => {
                        IP_C => {
                            name => IP_C,
                            netmask => 255.255.255.0,
                        },
                    },
                },
            },
            subnet => {
                SUBNET_A => {
                    name => SUBNET_A,
                    status => up,
                },
                SUBNET_B => {
                    name => SUBNET_B,
                    status => up,
                },
            },
        },
        nodeB => {
            name     => nodeB,
            status     => up,
            interface => {
                lan900 => {
                    name => lan900,
                    status => up,
                    ip_adress => {
                        IP_A => {
                            name => IP_B,
                            netmask => 255.255.255.0,
                        },
                    },
                },
                lan901 => {
                    name => lan901,
                    status => up,
                    ip_adress => {
                        IP_C => {
                            name => IP_D,
                            netmask => 255.255.255.0,
                        },
                    },
                },
            },
            subnet => {
                SUBNET_A => {
                    name => SUBNET_A,
                    status => up,
                },
                SUBNET_B => {
                    name => SUBNET_B,
                    status => up,
                },
            },
        },
    },
}
[download]

My code:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

$Data::Dumper::Indent = 1;
$Data::Dumper::Sortkeys = 1;

sub read_input {
    open(my $io, '<', 'input.txt') or die $!;
    return <$io>;
}

my @lines = read_input();

my %config;
my $hashref = \%config;

# stack of hash references
my @ref;

# count of | characters in a respective line
my $level = 0;

# count of | characters in the previous line
my $prev_level = 0;

my $key;
my $fkey;

for (@lines) {
    chomp;
    my @f = split(/\|/);
    $level = @f - 1;
    
    if ($level == 0) {
        my @f = split(/=/);
        $hashref->{ $f[0] } = $f[1];
    } else {
        my @q = split(/:/, $f[$level - 1]);
        if ($key && $level <= $prev_level) {
            if (/\Q$key\E/) {
                my @f = split(/=/, $f[$level]);
                $hashref->{ $f[0] } = $f[1];
                $prev_level = $level;
            } elsif (/\Q$fkey\E/) {
                $hashref = $ref[-1];
                
                $hashref->{$q[1]} = {};
                $hashref = $hashref->{$q[1]};
                
                $key = join(':', @q);
                $prev_level = $level;
                redo;
            } else {
                pop @ref;
                if ($level < $prev_level) {
                    my $diff = $prev_level - $level;
                    while ($diff--) { pop @ref; pop @ref; }
                }
                $hashref = pop @ref;
                $key = '';
                $prev_level = $level;
                redo;
            }
        } else {
            if (exists $hashref->{$q[0]}) {
                push @ref, $hashref;
                $hashref->{$q[0]}->{$q[1]} = {};
                $hashref = $hashref->{$q[0]}->{$q[1]};
            } else {
                push @ref, $hashref;
                $hashref->{$q[0]} = {};
                $hashref = $hashref->{$q[0]};
            
                push @ref, $hashref;
                $hashref->{$q[1]} = {};
                $hashref = $hashref->{$q[1]};
            }
            $key = join(':', @q);
            $fkey = $q[0];
            
            $prev_level = $level;
            redo;
        }
    
    }
}

print Dumper( \%config );
[download]

Output from the above code:

$VAR1 = {
  'cfs_online_timeout_changes' => '200',
  'max_configured_packages' => '100',
  'max_reformation_duration' => '300',
  'node' => {
    'nodeA' => {
      'interface' => {
        'lan900' => {
          'ip_address' => {
            'IP_A' => {
              'name' => 'IP_A',
              'netmask' => '255.255.255.0'
            }
          },
          'name' => 'lan900',
          'status' => 'up'
        },
        'lan901' => {
          'ip_address' => {
            'IP_C' => {
              'name' => 'IP_C',
              'netmask' => '255.255.255.0'
            }
          },
          'name' => 'lan901',
          'status' => 'up'
        }
      },
      'name' => 'nodeA',
      'status' => 'up'
    },
    'subnet' => {
      'SUBNET_A' => {
        'name' => 'SUBNET_A',
        'status' => 'up'
      },
      'SUBNET_B' => {
        'name' => 'SUBNET_B',
        'status' => 'up'
      }
    }
  }
};
[download]

Explanation:

the core of the algorithm/approach are $hashref, and @ref; $hashref being a reference to the hash where the key-value pairs will be inserted if found, @ref being a stack used to push and pop the current value of $hashref, for tracking the previous values, in case more elements will need to be inserted
if a line having the format 'key=value' is found it gets inserted in $hashref
if the line contains the | (pipe) character, it is split on this character, and the code tries to figure out whether to create a new subhash, or just insert the 'key=value' found at the end of this line

So, to sum it up, i am not looking for comments of my code - the reason for this is that even after several rewrites, it grows to a complexity level which is just too hard to understand, and debug ( i tried, i failed ); I am rather looking for a different approach to achieve my goal: transfer the input ( first code block ) to the desired output format ( second code block ). Thanks in advance to whoever takes the time to read through all of this.

In reply to Parsing structured text, problem with parsing logic by hperange

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.