pearllearner315 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I have a text file in this format:
(Special) TG: VIN:1000000 type: a very special type of plane (Special) TG: VIN:1000001 type: a very special type of car (Special) TG: VIN:1000002 type: a very special type of boat this repeats many times..
What I would like to do is, loop through this text file line by line and extract whatever comes after "TG:" and store as a key to a hash, and extract whatever comes after "type:" and store as a value to the key that was just stored. This is what I have so far:
#!/usr/bin/perl use strict; use warnings; my $textfile = 'file.txt'; open(TEXT, '<', $textfile) or die $!; my %hash; local $/="(Special)"; while(<TEXT>){ if ($_ =~ /TG:\s(VIN:\d+)\ntype:\s(.+)\n/){ my $keys = $1; my $values = $2; $hash{$keys} = $values; } }
I then print the hash but I only get the first entry in the hash, as if the loop stopped after the first iteration. What could be the problem here? Is it the regex? Thank you in advance for your wisdom!!

Replies are listed 'Best First'.
Re: extracting strings from text file
by LanX (Saint) on Sep 08, 2018 at 23:42 UTC
    works for me

    #!perl use strict; use warnings; use Data::Dump qw/pp dd/; my %hash; local $/="(Special)"; while(<DATA>){ if ($_ =~ /TG:\s(VIN:\d+)\ntype:\s(.+)\n/){ my $keys = $1; my $values = $2; $hash{$keys} = $values; } } pp \%hash; __DATA__ (Special) TG: VIN:1000000 type: a very special type of plane (Special) TG: VIN:1000001 type: a very special type of car (Special) TG: VIN:1000002 type: a very special type of boat this repeats many times..
    { "VIN:1000000" => "a very special type of plane", "VIN:1000001" => "a very special type of car", "VIN:1000002" => "a very special type of boat", }

    > What could be the problem here?

    You? ;-)

    update

    seriously I think your input is different from what you've shown us. probably check it bytes wise, especially line-endings.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re: extracting strings from text file
by AnomalousMonk (Archbishop) on Sep 09, 2018 at 01:15 UTC

    Another way, similar to the OPed and LanX's solutions:

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "use autodie; ;; my $file = qq{(Special)\n} . qq{TG: VIN:1000000\n} . qq{type: a very special type of plane\n} . qq{\n} . qq{(Special)\n} . qq{TG: VIN:1000001\n} . qq{type: a very special type of car\n} . qq{\n} . qq{ \t \n} . qq{(Special)\n} . qq{TG: VIN:1000002\n} . qq{type: a very special type of boat\n} ; ;; open my $fh, '<', \$file; ;; my $rx_vin = qr{ \d+ }xms; my $rx_text = qr{ \S (?: \s* \S+)* }xms; ;; my %hash; ;; local $/ = ''; while (my $block = <$fh>) { my $extracted = my ($vin, $text) = $block =~ m{ \A \s* \Q(Special)\E \s+ TG: \s+ VIN: \s* ($rx_vin) \s+ type: \s+ ($rx_text) \s* \Z }xms; die qq{bad block '$block'} unless $extracted; die qq{duplicate vin '$vin'} if exists $hash{$vin}; ;; $hash{$vin} = $text; } ;; close $fh; ;; dd \%hash; " { 1000000 => "a very special type of plane", 1000001 => "a very special type of car", 1000002 => "a very special type of boat", }
    Please note that:
    • A VIN (at least for a car) has a definite pattern that could be used to make  $rx_vin quite discriminating (it's quite naive now (update: indeed, it's incorrect for an automotive VIN; don't know about boats or planes)).
    • The  $rx_text regex could probably be sharpened, but I don't know just what your text looks like.
    • The whole extraction pattern is fairly tolerant of whitespace, but will die at the first "block" that doesn't look like a block, and you may not want this. (Also rejects duplicate VINs.)


    Give a man a fish:  <%-{-{-{-<

Re: extracting strings from text file
by tybalt89 (Monsignor) on Sep 09, 2018 at 14:43 UTC
    #!/usr/bin/perl # https://perlmonks.org/?node_id=1221960 use strict; use warnings; use Data::Dumper; open my $fh, '<', \<<END; # fake file for testing purposes (Special) TG: VIN:1000000 type: a very special type of plane (Special) TG: VIN:1000001 type: a very special type of car (Special) TG: VIN:1000002 type: a very special type of boat this repeats many times.. END my %hash = do{local $/; <$fh>} =~ /^TG:\s(VIN:\d+)\ntype:\s(.*)/gm; print Dumper \%hash;

    Outputs:

    $VAR1 = { 'VIN:1000001' => 'a very special type of car', 'VIN:1000002' => 'a very special type of boat', 'VIN:1000000' => 'a very special type of plane' };

      Golfing the OP's code helps neither the OP nor anyone's perception of Perl and its community.

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond