in reply to Process large text data in array

Turning a line of the format id_1=value|id_2=value|id_3=value|..... into a hash can be vastly simplified:

my $line = "id_1=value|id_2=value|id_3=value"; my %hash = split /[|=]/, $line;

Replies are listed 'Best First'.
Re^2: Process large text data in array
by Corion (Patriarch) on Mar 10, 2015 at 14:51 UTC

    This will result in weird behaviour if the string contains more than one equal sign (=) per column:

    foo=bar=baz|bar=bambam

      That is correct! If this case can happen and one insists on splitting on =, then the third parameter of split might be useful:

      @parts = split /=/, $line, 2;

      will return at most two parts, split on the first (if any) equal sign.

      Just to share with you all, before I store any values into my formatted data line, I do HTML::Entities::encode_numeric to make sure those unsafe characters encoded.

      id_1=[encoded value]|.....
Re^2: Process large text data in array
by hankcoder (Scribe) on Mar 10, 2015 at 14:58 UTC

    hdb your codes are excellent!! The speed reduced to only 21sec to complete. My previous sub codes were rather old and previous data format may contain more than 1 delimiter characters. But all my current data format will have "safe characters" encoding before storing. So I guess it is safe to use your code for my purpose use.

    If it is not too trouble, maybe could you help me improve the reversal of line2rec? Or that is the simplest and faster it can goes?

    #---------------------------------------------------# # REC2LINE #---------------------------------------------------# sub rec2line { my (%trec) = @_; my ($newline) = ""; my ($line); foreach $line (keys %trec) { if ($newline ne "") { $newline .= "|"; } $newline .= "$line=$trec{$line}"; } # end foreach return ("$newline"); } # end sub

    Thanks again.

      That is what join is for:

      $newline = join "|", map { "$_=$trec{$_}" } keys %trec;