in reply to Re: Parsing issue
in thread Parsing issue

Thanks...there is also the issue of having two or more spaces in a value like 'Name' for instance could be 'Some VLAN with    spaces'

Replies are listed 'Best First'.
Re^3: Parsing issue
by vitoco (Hermit) on Sep 10, 2009 at 19:30 UTC

    Then, you can forget this trick and try some of the other ideas I gave, like parsing each row at a time and cut each record at fixed columns or use some other regexp to get field values.

    #!perl use strict; use warnings; use Data::Dumper; my %hash = (); my $string = ""; while(<DATA>) { if (/^(VLAN)\s+:\s(.*?)\s+(Status)\s+:\s(.*?)\s*$/) { ($hash{$1}, $hash{$3}) = ($2, $4); } elsif (/^(FID)\s+:\s(.*?)\s+(Name)\s+:\s(.*?)\s*$/) { ($hash{$1}, $hash{$3}) = ($2, $4); } elsif (/^(VLAN Type):\s(.*?)\s+(Last change):\s(.*?)\s*$/) { ($hash{$1}, $hash{$3}) = ($2, $4); } elsif (/^\s((Forbidden )?(Egress|Untagged) Ports):\s*$/) { $string = $1; } elsif ($string) { /^\s*(.*?)\s*$/; ($hash{$string}, $string) = ($1, ""); } } print Dumper( \%hash ); __DATA__ VLAN : 1 Status : Enabled FID : 1 Name : Some VLAN with spaces VLAN Type: Permanent Last change: 2009-08-31 16:48:45 Egress Ports: host.0.1 Forbidden Egress Ports: ge.3.39 Untagged Ports: host.0.1
    $VAR1 = { 'Last change' => '2009-08-31 16:48:45', 'Status' => 'Enabled', 'Forbidden Egress Ports' => 'ge.3.39', 'FID' => '1', 'VLAN' => '1', 'Untagged Ports' => 'host.0.1', 'Egress Ports' => 'host.0.1', 'Name' => 'Some VLAN with spaces', 'VLAN Type' => 'Permanent' };

    Here, I used "(.*?)\s*" to get a trimmed value of any type field, but you should change each of them to a specific pattern for dates, integers...

    Update: Forgot one field...

Re^3: Parsing issue
by vitoco (Hermit) on Sep 11, 2009 at 14:43 UTC

    I forgot to mention in my previous post that the if (pat1) {} elsif (pat2) {} elsif ... inside a while loop method is useful when data records is not always in the same order.

    In this case, where your "output" format is fixed, it's better (faster) to use a per line parsing method (no while):

    #!perl use strict; use warnings; use Data::Dumper; my %hash = (); $_ = <DATA>; /:\s(.*?)\s+\w+\s+:\s(.*?)\s*$/; $hash{'VLAN'} = $1; $hash{'STAT'} = $2; $_ = <DATA>; /:\s(.*?)\s+\w+\s+:\s(.*?)\s*$/; $hash{'FID'} = $1; $hash{'NAME'} = $2; $_ = <DATA>; /:\s(.*?)\s+\w+\s\w+:\s(.*?)\s*$/; $hash{'VTYPE'} = $1; $hash{'LASTM'} = $2; $_ = <DATA>; $_ = <DATA>; /^\s*(.*?)\s*$/; $hash{'EP'} = $1; $_ = <DATA>; $_ = <DATA>; /^\s*(.*?)\s*$/; $hash{'FEP'} = $1; $_ = <DATA>; $_ = <DATA>; /^\s*(.*?)\s*$/; $hash{'UP'} = $1; print Dumper( \%hash ); __DATA__ VLAN : 1 Status : Enabled FID : 1 Name : Some VLAN with spaces VLAN Type: Permanent Last change: 2009-08-31 16:48:45 Egress Ports: host.0.1 Forbidden Egress Ports: ge.3.39 Untagged Ports: host.0.1

    This way, you can control other things like key names:

    $VAR1 = { 'NAME' => 'Some VLAN with spaces', 'LASTM' => '2009-08-31 16:48:45', 'VTYPE' => 'Permanent', 'FID' => '1', 'VLAN' => '1', 'STAT' => 'Enabled', 'UP' => 'host.0.1', 'EP' => 'host.0.1', 'FEP' => 'ge.3.39' };

    Thinking a bit more on my first script, I realize that the string can be also modified in the following way: add a new delimiter just before what we detect as a field name (words separated by exactly one space, before any colon followed by a space), then split:

    #!perl use strict; use warnings; use Data::Dumper; my $string = ""; while(<DATA>) { $_ =~ s/\n/ /g; $string .= $_; } $string =~ s/((\w+\s)*\w+)\s*:\s+/\%$1%/g; $string =~ s/^\%(.*)/$1\%/; $string =~ s/\s+\%/\%/g; my %hash = split (/\%/, $string); print Dumper( \%hash ); __DATA__ VLAN : 1 Status : Enabled FID : 1 Name : Some VLAN with spaces VLAN Type: Permanent Last change: 2009-08-31 16:48:45 Egress Ports: host.0.1 Forbidden Egress Ports: ge.3.39 Untagged Ports: host.0.1
    $VAR1 = { 'Last change' => '2009-08-31 16:48:45', 'Status' => 'Enabled', 'Forbidden Egress Ports' => 'ge.3.39', 'FID' => '1', 'VLAN' => '1', 'Untagged Ports' => 'host.0.1', 'Egress Ports' => 'host.0.1', 'Name' => 'Some VLAN with spaces', 'VLAN Type' => 'Permanent' };

    I also changed the delimiter to another unused char, to differentiate it from the colon inside the time value when trimming out extra spaces.

    BTW, this was fun!