in reply to perl parsing

G'day cbtshare,

Here's the technique I might have used for this task:

#!/usr/bin/env perl use strict; use warnings; use autodie; use constant { IN_FILE => 'pm_1200636_text.txt', HEADER => 0, KEY => 1, VALUE => 2, }; my %parsed; { open my $fh, '<', IN_FILE; my $name; while (<$fh>) { my @fields = split; if ($fields[HEADER] eq 'name') { $name = $fields[KEY]; next; } if ($fields[HEADER] eq 'device') { push @{$parsed{$name}{$fields[KEY]}}, $fields[VALUE]; next; } } } # For testing only use Data::Dump; dd \%parsed;

This only reads a record at a time, so there should be no memory issues that might occur when slurping entire files. The only data that persists after the anonymous block is %parsed: process that as necessary. Also note that as $fh goes out of scope at the end of the anonymous block, Perl automatically closes this for you (there's no need for a close statement in this instance).

I used the same data as you posted (see the spoiler).

Output from a sample run:

{ Andrew => { ipad => [2009] }, Brian => { ipad => [2001, 2001, 2001] }, ryan => { cell => [2009], ipad => [2005] }, }

See also: "perldsc - Perl Data Structures Cookbook"; autodie; open; and, Data::Dump. Everything else is very straightforward and basic Perl, but feel free to ask if anything is unclear.

— Ken

Replies are listed 'Best First'.
Re^2: perl parsing
by cbtshare (Monk) on Oct 05, 2017 at 19:27 UTC

    Thank very much .Your solution is quite similar to that of poj. I will attempt to explain what is being done and then how I used what I understood to try and arrive at the solution I need.

    while (<IN>){ #remove spaces from the beginning or the end of the file s/^\s+|\s+$//g; # splits the files based on columns based on space and limit the amoun +t split by 4 my ($col1,$col2,$col3) = split /\s+/,$_,4; #checks to see if the word name is matched to get the variable next ov +er which would be the actual name , then put it in variable $name if ($col1 eq 'name'){ $name = $col2; #checks to see if the word device is matched to get the variable next +over which would be the actual type, then next over is another attrib +ute(not on the example) } elsif ($col1 eq 'device') { ##Here the push name, device type and other variable into a hash push @{$hash{$name}},$col2, $col3; } else { # skip line } } close IN;
    #prints everything print Dumper \%hash

    My issue now comes when I need to print out the content in a structure way, or into a file name device $col3 device $col3 I can sort through hash and get the name only, not all the other attributes.But why? I put them all into the hash right?

    foreach my $line(keys %hash) { print $line }

    I believe you are doing somewhat similar

    ##defining the fields you want including the file, HEADER would be the + first field and if name or device then KEY is the next value over an +d VALUE the next use constant { IN_FILE => 'pm_1200636_text.txt', HEADER => 0, KEY => 1, VALUE => 2, }; my %parsed; { open my $fh, '<', IN_FILE; my $name; while (<$fh>) { my @fields = split; if ($fields[HEADER] eq 'name') { $name = $fields[KEY]; next; }

    This is the part that gives me issues since I need to print the values in a specified format, so data dumper wouldnt work , any help please?

    if ($fields[HEADER] eq 'device') { push @{$parsed{$name}{$fields[KEY]}}, $fields[VALUE]; next; } } }

      Your analysis of what the code is doing is mostly correct. In places, you indicate that operations are being performed on "files"; both solutions are reading the files line-by-line, and those operations are being performed on "records". Consider these corrections:

      #remove spaces from both the beginning orand the end of the filerecord
      # splits the filesrecords based on ...

      You also appear to have misunderstood the LIMIT argument of split: you've used a value of 4 in two places, which doesn't make much sense as the maximum number of fields of any record is 3. Further reading of that documentation will explain why "@fields = split;" needs no arguments nor any preprocessing to trim whitespace.

      The data structures produced by the two solutions are different: an HoA and an HoHoA. We both provided a link to perldsc: perhaps you need to read, reread or study in more detail.

      The part that seems to elude you, in both cases, is how to translate the information in the data structures to whatever output format you need. You wrote (at the end of each of those analyses, respectively):

      "My issue now comes when I need to print out the content in a structure way, ..."
      "This is the part that gives me issues since I need to print the values in a specified format, ..."

      Without any knowledge of the required output format, there's no way we can help. Again, the perldsc documentation has several sections on accessing the data in complex structures: the answer probably lies therein.

      There are a few other areas where it looks like you really don't understand certain fundamentals. For instance, using the name $line for the variable that holds a key in:

      foreach my $line(keys %hash) { print $line }

      would seem to indicate that you don't know what keys does.

      I would recommend that you bookmark perlintro and refer to it often. Make sure you understand the very basic information it presents, then follow links to related functions, in-depth documentation, tutorials, advanced topics, and so on, as necessary. For instance, the section on Hashes has links to keys and values (I half suspect that, in the code previously mentioned, "values %hash" was probably closer to what you wanted, instead of "keys %hash"); you'll also find many others such as perldata (fuller details), perlreftut (tutorial), and even perldsc (advanced topic already mentioned). Do note that's just some of the links in one of many sections: the entire document is like that and I think you'll find it a most useful resource.

      — Ken

        Thank you for your guidance, I will be sure to go through all the material provided.Sorry I didnt specify te format I want, I typed it in after the edit and hit submit, guess it didn't accept it.The format I want is below and to place that format in multiple files :
        Brian Ipad Ipad other file Andrew iphone ipad