Arengin has asked for the wisdom of the Perl Monks concerning the following question:

Hi.

I have the following code:
#!/usr/bin/env perl use strict; use warnings; use Data::Dumper; #read input data: my @rows; #set record separator to 3 line feeds. local $/ = "\n\n\n"; while ( <> ) { next unless m/Dumpdata example/; #map key-values out of this 'chunk'. my %row = m/\s*(\w+)\S*\s+(\S.*)/g; push @rows, \%row; } #print whole data structure for debugging: print Dumper \@rows; #define columns and ordering for output: my @output_cols = qw /Info Detail Warning Spec/; #iterate rows foreach my $row ( @rows ) { #print fields selected from output_cols. #use a 'hash slice' - look it up in perl docs. print join ";", @{$row}{@output_cols},"\n"; }

It works just fine except for the problem, that it ends at the line end.
If for example Info is on 2 lines I only get the first part in the output.

Dumpdata example ----------------- Warning bad news here Detail: Some really nice infos these are Info: This is a problem but there is a solution Spec: 2nd of 4

<Update>
The expected output for this should be:
"bad news here"; "Some really nice infos these are"; "This is a proble +m but there is a solution"; "2nd of 4"

Thanks you haukex for reminding me to post that too.
</Update>

My above code would return This is a problem but it should return This is a problem but there is a solution

Any ideas on how to get this done?

Thank you so much

Arengin

Replies are listed 'Best First'.
Re: Joining multiple lines together while parsing
by Corion (Patriarch) on Mar 24, 2017 at 10:24 UTC

    I think your regular expression here:

    my %row = m/\s*(\w+)\S*\s+(\S.*)/g;

    ... doesn't extract everything because it's in "dot-does-not-match-newline" mode.

    The /s switch makes "." match newlines ("treat variable as 's'ingle line") and thus could fix your problem:

    $_ = <<DATA; Detail: Some really nice infos these are Info: This is a problem but there is a solution DATA my %row = m/\s*(\w+):\s+(\S[^:]+)/gs;

    But then, that means that Info gets gobbled up again into the description part.

    Personally, I would do manual line-by-line parsing instead of using one regular expression to capture everything:

    $_ = <<DATA; Detail: Some really nice infos these are Info: This is a problem but there is a solution DATA my %row; my $curr; for (split /\n/) { if( /^\s*(\w+):\s+(.*)/ ) { $curr = $1; $row{ $curr } = $2; } elsif( $curr and /^\s*(.*)/ ) { $row{ $curr } .= ' ' . $1; } else { die "Unknown input data [$_]"; }; };
Re: Joining multiple lines together while parsing
by haukex (Archbishop) on Mar 24, 2017 at 10:22 UTC

    If you could show some more sample input and especially the expected output (Update: root node has been edited to include that), that would be helpful, for example I'm not sure if you want "Warning bad news here" to appear in the output.

    Based on what you've provided, here's one way, using a negative lookahead to prevent the "keywords" from being interpreted as continuations.

    use strict; use warnings; use Data::Dumper; local $/ = "\n\n\n"; while (<DATA>) { next unless m/Dumpdata example/; my %row = m/ ^ \s* (\w+:) \s+ ( (?: (?!^\s*\w+:) . )+ ) /xmsg; print Dumper(\%row); } __DATA__ Dumpdata example ----------------- Warning bad news here Detail: Some really nice infos these are Info: This is a problem but there is a solution

    Output:

    $VAR1 = { 'Detail:' => 'Some really nice infos these are ', 'Info:' => 'This is a problem but there is a solution ' };
      The output to the example should be:

      "bad news here"; "Some really nice infos these are"; "This is a proble +m but there is a solution"; "2nd of 4"
      Sorry for not posting that, I forgot completly about that.

        In that case, how are continuation lines identified? In other words, how should the program act in the case of the following input?

        Detail: Some really nice infos these are Warning bad news here Detail: Some really nice infos these are Info: This is a problem but there is a solution Warning bad news here is this a continuation or not?
Re: Joining multiple lines together while parsing
by tybalt89 (Monsignor) on Mar 24, 2017 at 22:44 UTC
    #!/usr/bin/perl -l # http://perlmonks.org/?node_id=1185742 use strict; use warnings; $/ = 'Dumpdata example ----------------- '; while(<DATA>) { chomp; my @answer = map s/\s+/ /gr =~ s/\s+\z//r =~ s/.*/"$&"/sr, /^Warning (.*)/gm, /^\s*\w+:\s*(\S.*(?:\n(?!\s*\w+:)\s*\S.*)*)/gm; @answer and print join '; ', @answer; } __DATA__ Dumpdata example ----------------- Warning bad news here Detail: Some really nice infos these are Info: This is a problem but there is a solution Spec: 2nd of 4 Dumpdata example ----------------- Warning test Detail: foo bar Info: quz baz Spec: blah
Re: Joining multiple lines together while parsing
by Anonymous Monk on Mar 24, 2017 at 16:46 UTC
    Just another way to do things, but this one is most likely not as robust as your production data will command:
    my $data = do{ local $/; <DATA> }; my @parts = split( /(\w+:)/, $data ); shift @parts unless $parts[0] =~ /\w+:/; for (@parts) { s/^\s+//; s/\s+$//; s/\s+/ /g; } use Data::Dumper; print Dumper {@parts}; __DATA__ Warning bad news here Detail: Some really nice infos these are Info: This is a problem but there is a solution Spec: 2nd of 4
Re: Joining multiple lines together while parsing
by Anonymous Monk on Mar 24, 2017 at 10:15 UTC
    what?
      I need the line feed to be skipped until the next real statement line starts.