in reply to problems splitting ugly input data

That one almost works, except that I end up with an extra (empty) value in $stuff2[0],

That's because using split there will always be an implied empty field preceding the first tag. You could just shift it off the array before building your hash.

Personally, I think I'd use m/// for this:

#! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 50; my %hash = do{ local $/; <DATA> } =~ m[(TAG\d=)\s+(.+?)(?=TAG|\Z)]gsm; pp \%hash; __DATA__ TAG1= data TAG2= more data TAG3= even more data that sometimes has = and runs on to more than one line TAG4= still more

Produces:

c:\test>junk15 { "TAG1=" => "data\n", "TAG2=" => "more data\n", "TAG3=" => "even more data that sometimes has = and\nruns on to more +\nthan one line\n", "TAG4=" => "still more", }

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: problems splitting ugly input data
by Anonymous Monk on Dec 23, 2010 at 00:54 UTC
    Nice. Thank you.
    That's because using split there will always be an implied empty field preceding the first tag.
    Yes, I have re-learned that many times.

    Would it be OK if I up the ante a bit? The tags are really not so well structured. They are things like HOSTNAME, CONTACT, .... And my mojo ain't workin' quite well enough to see how to adapt your solution to 10-15 tags of that ilk (at least I don't see how to do it in a nice, tidy way).

      Would it be OK if I up the ante a bit? The tags are really not so well structured. They are things like HOSTNAME, CONTACT, ....

      Sure:

      #! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 50; my $reTags = join '|', map quotemeta, qw[ HOSTNAME CONTACT TAG1 TAG2 TAG3 TAG4 ]; $reTags = qr[$reTags]; my %hash = do{ local $/; <DATA> } =~ m[($reTags)=\s+(.+?)(?=$reTags|\Z)]gsm; pp \%hash; __DATA__ TAG1= data TAG2= more data HOSTNAME= fred TAG3= even more data that sometimes has = and runs on to more than one line CONTACT= Wiley Coyote Hiesenberg Road The Desert TAG4= still more

      Produces:

      c:\test>junk15 { CONTACT => "Wiley Coyote\nHiesenberg Road\nThe Desert\n", HOSTNAME => "fred\n", TAG1 => "data\n", TAG2 => "more data\n", TAG3 => "even more data that sometimes has = and\nruns on to mor +e\nthan one line\n", TAG4 => "still more", }

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Thank you, again. Those are the kinds of things I was trying to come up with on my own. So I learned (several) something(s) in this.

        Cheers.