in reply to Parsing record into hash

Odd file format. Looks like someone saw XML, but didn't get the point. The following constructs a hash of hashes containing QSO records and the header record.

use strict; use warnings; use Data::Dump::Streamer; my %QSOs; $QSOs{header} = ''; while (<DATA>) { $QSOs{header} .= $_; last if /<eoh>/; } my %qso; my $key = ''; while (defined (my $line = <DATA>)) { chomp $line; next if ! length $line; next if ! ($line=~ /<qso_date:/ or length $key); if (! length $key) { $line=~ s/<qso_date:[^>]*>([^<]*)<time_on:[^>]*>([^<]*)(<?)/$3 +/; $key = "$1:$2"; } my @fields = split '<', $line; for (@fields) { my ($tag, $text) = /([^>]*)>(.*)/; next if ! defined $tag or ! length $tag; if ($tag eq 'eor') { $QSOs{$key} = {%qso} if length $key; $key = ''; %qso = (); last; } $qso{$tag} = $text || ''; } } Dump (\%QSOs); __DATA__ Exported by jLog (c)2006 LA3HM, V 3.90.2.7 according to ADIF <adif_ver +:1>2 <PROGRAMID:4>jLog For jLog info: mailto:mail@jlog.org http://jlog.org/ Proposed ADIF2 Extensions may be included <eoh> <qso_date:8:d>20051029 <time_on:6>213400 <call:4>VC3O <band:3>20M <mode:3>SSB <operator:3>VHF <rst_sent:2>59 <rst_rcvd:2>59 <dxcc:1>1 <stx:1>4 <srx:1>4 <ituz:1>4 <cqz:1>4 <pfx:3>VC3 <con +t:2>NA <freq:2>14 <qsoComplete:1> <app_jlog_qso_number:4>0001 <app_jlog_eqsl_qsl_sent:1>Y <app_jlog_eqsl_qsl_rcvd:1>Y <app_jlog_lotw_qsl_sent:1>Y <qsl_sent_via:1>E <eor> <qso_date:8:d>20060701 <time_on:6>183206 <call:5>VE6GG <band:3>20M <mode:3>SSB <operator:3>MWB <rst_sent:2>59 <rst_rcvd:2>59 <dxcc:1>1 <stx:2>27 <srx:2>AB <ituz:1>2 <cqz:1>4 <contest_id:3> +RAC <pfx:3>VE6 <cont:2>NA <freq:8>14.16299 <state:2>AB <qsoComplete: +1> <app_jlog_qso_number:4>1257 <app_jlog_eqsl_qsl_sent:1>Y <app_jlog_eqsl_qslsdate:10>2006-07-01 <app_jlog_lotw_qsl_sent:1>Y <app_jlog_lotw_qslsdate:10>2006-07-01 <operator:4>N7DQ <eor>
$HASH1 = { "20051029 :213400" => { "app_jlog_eqsl_qsl_rcvd:1" => 'Y', "app_jlog_eqsl_qsl_sent:1" => 'Y ', "app_jlog_lotw_qsl_sent:1" => 'Y ', "app_jlog_qso_number:4" => '0001', "band:3" => '20M ', "call:4" => 'VC3O ', "cont:2" => 'NA', "cqz:1" => '4 ', "dxcc:1" => '1 ', "freq:2" => '14 ', "ituz:1" => '4 ', "mode:3" => 'SSB ', "operator:3" => 'VHF', "pfx:3" => 'VC3 ', "qsl_sent_via:1" => 'E ', "qsoComplete:1" => ' ', "rst_rcvd:2" => 59, "rst_sent:2" => '59 ', "srx:1" => '4 ', "stx:1" => '4 ' }, "20060701 :183206" => { "app_jlog_eqsl_qsl_sent:1" => 'Y', "app_jlog_eqsl_qslsdate:10" => '2006-07-01 ', "app_jlog_lotw_qsl_sent:1" => 'Y', "app_jlog_lotw_qslsdate:10" => '2006-07-01 ', "app_jlog_qso_number:4" => '1257 ', "band:3" => '20M ', "call:5" => 'VE6GG ', "cont:2" => 'NA ', "contest_id:3" => 'RAC', "cqz:1" => '4 ', "dxcc:1" => '1 ', "freq:8" => '14.16299 ', "ituz:1" => '2 ', "mode:3" => 'SSB ', "operator:3" => 'MWB', "operator:4" => 'N7DQ ', "pfx:3" => 'VE6 ', "qsoComplete:1" => '', "rst_rcvd:2" => 59, "rst_sent:2" => '59 ', "srx:2" => 'AB ', "state:2" => 'AB ', "stx:2" => '27 ' }, header => "Exported by jLog (c)2006 LA3HM, V 3. +90.2.7 according to ADIF <adif_ver:1". ">2\n<PROGRAMID:4>jLog\nFor jLog info:\n mailto:mail\@jlog.org\n + http://jlog". ".org/\n Proposed ADIF2 Extensions may be included\n<eoh>\n" };

DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^2: Parsing record into hash
by sxmwb (Pilgrim) on Jul 04, 2006 at 22:06 UTC
    Thank you, the data is an Amateur Radio Data interchange format. I think you are right that they saw XML and did not really know what was going on. Your solution is simple and now to understand it. So far two answers in two hours is great and two different ways of doing it.

    Thanks Mike

      I think you are right that they saw XML and did not really know what was going on.

      Actually, the format makes more sense than you and GrandFather give it credit. The general field format is:

      '<' identifier ':' length '>' value

      There also looks to be a date, that's marked 'd' , after the length (in field qso_date).

      I'd much rather have to parse this format than something like METAR, where you have to make guesses about the fields you're processing based on their order and format.