bh_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi

Anybody have any ideas/suggestion how to read this files (sample files contents as below):

(Filename: test.txt) Name : >Mick< [Mick] IC : '91919191929' [9191919129] PC : 123 [123] Number : >1960132400000< [1960132400000] Location : >000e 0036< [000e 0036] extension : >< Capability : >"CARES< ["CARES] Info : >6005494c523142< [6005494c523142] Name : >Nick< [Nick] IC : '1235467000' [1235467000] PC : 124 [124] Number : >1960192500000< [1960192500000] Location : >000f 0034< [000f 0034] extension : >< Capability : >< Info : >< Name : >Nick< [Nick] IC : '1235467000' [1235467000] PC : 124 [124] Number : >< [00] Location : >000f 0034< [000f 0034] extension : >< Capability : >.< Info : ><
My original option are:
0. read files content
1. delete all the items likes ('|"|<|>);
2. split by line (split(/\n/));
3. Get $1 and $2 and insert into asociated array;
Ex: $1 ==> Capability: $2 ==> 124
4. Then, If the Capability or Info = "" or null, then change it to TEST and print.
5. Print back all the items without ('|"|<|>) and ignore all items in [.......]. But, I can't.

Then, I've tried another method and the result still same. My second method are:
0. read files content
1. Change all the itesm (><|>\.<) to >TEST<
2. delete all the items likes ('|"|<|>);
3. split by line (split(/\n/));
4. Get $1 and $2 and insert into asociated array;
Ex: $1 ==> Capability: $2 ==> 124
5. Then, If the Capability or Info =~ /TEST/ then print.
6. Print back all the items without ('|"|<|>) and ignore all items in [.......].

Hopefully, somebody can help me....

Otherwise, My original script can be review in [id://272622[ or How to read/replace null value/parameters.

Thank you,
bh_perl

Replies are listed 'Best First'.
Re: Can't read files content
by BrowserUk (Patriarch) on Jul 11, 2003 at 04:56 UTC

    I agree with artist that it's not very clear from your post what you are trying to achieve, or what you are having problems with. The following code shows how I would read the file.

    It use 'paragraph mode' (See perlvar:$INPUT_RECORD_SEPERATOR) to read each record in one chuck.

    It then uses a large and complicated looking but quite simple to derive, regex to parse the record into $1 .. $23. (Note: The use of the /m option. See perlre for details)

    When the regex matches, it builds a hash with the first field on each line as the key name, and the second & third fields in an anonymous array as the value. Except for the 'extension' line which doesn't appear to have a third field, though this is easily catered for if that can be present. If the record matches and the hash is populated, then the hash is pushed onto the array of records, else a diagnostic message is emitted. Finally, I just dumped @records which should make it clear how to access the structure in order to perform the rest of your processing.

    #! perl -slw use strict; use Data::Dumper; local $/=''; # Paragraph mode my $re = qr/ ^(Name) \s+ : \s > ( [^<]* ) < \s+ \[ ( [^\x5D]* ) \] \s+ + # 1, 2, 3 ^(IC) \s+ : \s ' ( [^']* ) ' \s+ \[ ( [^\x5D]* ) \] \s+ + # 4, 5, 6 ^(PC) \s+ : \s ( \d* ) \s+ \[ ( [^\x5D]* ) \] \s+ + # 7, 8, 9 ^(Number) \s+ : \s > ( [^<]* ) < \s+ \[ ( [^\x5D]* ) \] \s+ + # 10, 11, 12 ^(Location) \s+ : \s > ( [^<]* ) < \s+ \[ ( [^\x5D]* ) \] \s+ + # 13, 14, 15 ^(extension) \s+ : \s > ( [^<]* ) < \s* + # 16, 17 ^(Capability) \s+ : \s > ( [^<]* ) < (?: \s+ \[ ( [^\x5D]* ) \] )? \s* + # 18, 19, 20 ^(Info) \s+ : \s > ( [^<]* ) < (?: \s+ \[ ( [^\x5D]* ) \] )? + # 21, 22, 23 /mx; my @records; while( <DATA> ) { my %hash; @hash{$1, $4, $7, $10, $13, $16, $18, $21} = ( [$2, $3], [$5, $6], [$8, $9], [$11, $12], [$14, $15], $17, [$19, $20], [$22, $23], ) if m[$re]; if( %hash ) { push @records, \%hash; } else { warn "Record $.\n'$_'\nfailed to match."; } } print Dumper \@records; __DATA__ Name : >Mick< [Mick] IC : '91919191929' [9191919129] PC : 123 [123] Number : >1960132400000< [1960132400000] Location : >000e 0036< [000e 0036] extension : >< Capability : >"CARES< ["CARES] Info : >6005494c523142< [6005494c523142] Name : >Nick< [Nick] IC : '1235467000' [1235467000] PC : 124 [124] Number : >1960192500000< [1960192500000] Location : >000f 0034< [000f 0034] extension : >< Capability : >< Info : >< Name : >Nick< [Nick] IC : '1235467000' [1235467000] PC : 124 [124] Number : >< [00] Location : >000f 0034< [000f 0034] extension : >< Capability : >.< Info : >< Name : >Nick< [Nick} IC : '1235467000' [1235467000] PC : 124 [124] Number : >< [00] Location : >000f 0034< [000f 0034] extension : >< Capability : >.< Info : ><

    Output


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: Can't read files content
by JamesNC (Chaplain) on Jul 11, 2003 at 05:09 UTC
    Yet another way...
    use strict; use Data::Dumper; my $i = 0; my %records; while(<DATA>){ my($key, $val); if (/^$/){ #end of record $i++; next; } ($key, $val) = ($1, $2) if /(\w+)\s+:.(\d+) /; ($key, $val) = ($1, $2) if /(\w+)\s+:.>(.*)</; ($key, $val) = ($1, $2) if /(\w+)\s+:.'(.*)'/; $records{$i}{$key}=$val; } print Data::Dumper->Dump([\%records],[qw/records/]); __DATA__ Name : >Mick< [Mick] IC : '91919191929' [9191919129] PC : 123 [123] Number : >1960132400000< [1960132400000] Location : >000e 0036< [000e 0036] extension : >< Capability : >"CARES< ["CARES] Info : >6005494c523142< [6005494c523142] Name : >Nick< [Nick] IC : '1235467000' [1235467000] PC : 124 [124] Number : >1960192500000< [1960192500000] Location : >000f 0034< [000f 0034] extension : >< Capability : >< Info : >< Name : >Nick< [Nick] IC : '1235467000' [1235467000] PC : 124 [124] Number : >< [00] Location : >000f 0034< [000f 0034] extension : >< Capability : >.< Info : ><

    OUTPUT:
    
    $records = { '1' => { 'Capability' => '', 'extension' => '', 'PC' => '124', 'Number' => '1960192500000', 'IC' => '1235467000', 'Info' => '', 'Location' => '000f 0034', 'Name' => 'Nick' }, '0' => { 'Capability' => '"CARES', 'extension' => '', 'PC' => '123', 'Number' => '1960132400000', 'IC' => '91919191929', 'Info' => '6005494c523142', 'Location' => '000e 0036', 'Name' => 'Mick' }, '2' => { 'Capability' => '.', 'extension' => '', 'PC' => '124', 'Number' => '', 'IC' => '1235467000', 'Info' => '', 'Location' => '000f 0034', 'Name' => 'Nick' } };

    This is a hoh, just as easily could be a AoH's...
    JamesNC
      Hi,
      
      Thank you very-very much for all your advise and helpfully.....Thank you...
      
      Sorry about late responding..
      
      
Re: Can't read files content
by artist (Parson) on Jul 11, 2003 at 04:24 UTC
    I am not able to understand fully what you want .. Please explain some more if following is not what you intended.
    my $record; while(<DATA>){ if(/^$/){ printme($record); $record = {}; } else{ ($label,$data) = split /:/; if(/\[(.*?)\]/){ $data = $1; } elsif(/\>(.*?)\</){ $data = $1; } $label =~ s/\s+$//; } if(($label eq 'Capability') || ($label eq 'Info')){ if($data =~ //){ $data = 'TEST'; } } $record->{$label} = $data; } sub printme { my $record = shift; foreach (keys %{$record}){ print "$_:",$record->{$_},"\n"; } print "=" x 30,"\n"; }
    artist
Re: Can't read files content
by mildside (Friar) on Jul 11, 2003 at 05:00 UTC

    An original post. However, before I answer that may I just ask this:- Huh?

    Seriously though, can you answer these questions:

    - What is the desired result? If you want the data printed out in another format, please give an example of that format.
    - Do you have any existing attempts at writing the code? If so, show that code so that it can be corrected etc.