in reply to Re: Joining multiple lines together while parsing
in thread Joining multiple lines together while parsing

The output to the example should be:

"bad news here"; "Some really nice infos these are"; "This is a proble +m but there is a solution"; "2nd of 4"
Sorry for not posting that, I forgot completly about that.

Replies are listed 'Best First'.
Re^3: Joining multiple lines together while parsing
by haukex (Archbishop) on Mar 24, 2017 at 10:31 UTC

    In that case, how are continuation lines identified? In other words, how should the program act in the case of the following input?

    Detail: Some really nice infos these are Warning bad news here Detail: Some really nice infos these are Info: This is a problem but there is a solution Warning bad news here is this a continuation or not?
      The input is always like:
      Dumpdata example ----------------- Warning Detail: Info: Spec: Dumpdata example ----------------- Warning Detail: Info: Dumpdata example ----------------- Warning Detail: Spec: Dumpdata example ----------------- Warning Detail: Info: Spec:

      The "Dumpdata example" always starts a new section even if not all elements were present.

        Here's one possible regex approach using named captures (perlre) and %+. However, it's starting to get a little complex, so the line-based state machine type approach from Corion's post is starting to look a little better in this case (I gave some more examples of the state machine approach in this thread).

        use warnings; use strict; use Data::Dump; local $/ = "\n\n\n"; while (<DATA>) { # clobber the header s/ ^ \s* Dumpdata\s+example \s* \n \s* -+ \s* \n //msx or next; my %row; $row{$+{key}} = $+{val} while m{ (?: # the very first key doesn't need colon \A \s* (?<key> \w+ ) # but the other keys need colons | ^ \s* (?<key> \w+ : ) ) \s+ # values should end at the next key (?<val> (?: (?!^\s*\w+:) . )+ ) }xmsg; s/\s+/ /g for values %row; dd \%row; } __DATA__ Dumpdata example ----------------- Warning bad news here Detail: Some really nice infos these are Info: This is a problem but there is a solution Dumpdata example ----------------- Warning test Detail: foo bar Info: quz baz Spec: blah

        Output:

        { "Detail:" => "Some really nice infos these are ", "Info:" => "This is a problem but there is a solution ", "Warning" => "bad news here ", } { "Detail:" => "foo bar ", "Info:" => "quz baz ", "Spec:" => "blah ", "Warning" => "test ", }

        Update: Using the "branch reset" pattern (?|...) allows for a little bit of simplification:

        my %row = m{ (?| \A \s* ( \w+ ) | ^ \s* ( \w+ : ) ) \s+ ( (?: (?!^\s*\w+:) . )+ ) }xmsg;
      Hi
      That last one works great, but is there a way, to add " to every field at the start and end, so that the values are like "value" so I can use it for csv?
        so I can use it for csv?

        Use Text::CSV:

        use Text::CSV; my $csv = Text::CSV->new({binary=>1, always_quote=>1, blank_is_undef=>1, eol=>$/, auto_diag=>2}); $csv->print(select, ['foo', 'bar']); __END__ "foo","bar"

        Replace the call to select with a $filehandle (open) if you're writing to a file.

        Just use a proper CSV module which will do this automatically for you. eg Text::CSV

      Works great. thank you very much