Re: How to read a GEDCOM file

"Hello, this is my first ever go at Perl."

As has alreay been pointed out, there's quite a few problems there; and you've received good advice on dealing with these.

I was putting together a short script to show how to do this without slurping entire files into arrays (which can often be problematic when large files chew up lots of memory). Additionally, I included code to create a temporary test file before processing and to delete it afterwards; also, there's a routine to check for the existence of that file. I've ended up with "pm_1203774_file_io_basics.pl" which covers many of the basic aspects of I/O and file handling.

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

my %wanted_records = map { $_ => 1 } @ARGV;

my $filename = 'pm_1203774_transient_data';

check_temporary_file($filename);
create_temporary_file($filename);
check_temporary_file($filename);
process_temporary_file($filename, \%wanted_records);
check_temporary_file($filename);
delete_temporary_file($filename);
check_temporary_file($filename);

sub create_temporary_file {
    my ($file_to_create) = @_;

    open my $out_fh, '>', $file_to_create;

    print $out_fh "$_\n" for 'A' .. 'Z';

    return;
}

sub process_temporary_file {
    my ($file_to_process, $wanted_records_ref) = @_;

    my $total_records = 0;

    {
        open my $in_fh, '<', $file_to_process;

        while (<$in_fh>) {
            next unless delete $wanted_records_ref->{$.};

            print "Record $.: $_";
        }

        $total_records = $.;
    }

    print "No. of records: $total_records\n";

    my @problem_records = sort keys %$wanted_records_ref;

    if (@problem_records) {
        warn "Problem records: @problem_records\n";
    }

    return;
}

sub delete_temporary_file {
    my ($file_to_delete) = @_;

    unlink $file_to_delete;

    return;
} 

sub check_temporary_file {
    my ($file_to_check) = @_;

    if (-e $file_to_check) {
        print "'$file_to_check' exists.\n";
    }
    else {
        print "'$file_to_check' not found.\n";
    }

    return;
}
[download]

Here's some sample runs. Firstly, with no arguments, just the record count is reported:

$ pm_1203774_file_io_basics.pl
'pm_1203774_transient_data' not found.
'pm_1203774_transient_data' exists.
No. of records: 26
'pm_1203774_transient_data' exists.
'pm_1203774_transient_data' not found.
[download]

Arguments specify the record numbers you want to print:

$ pm_1203774_file_io_basics.pl 1 2 3
'pm_1203774_transient_data' not found.
'pm_1203774_transient_data' exists.
Record 1: A
Record 2: B
Record 3: C
No. of records: 26
'pm_1203774_transient_data' exists.
'pm_1203774_transient_data' not found.
[download]

The order of arguments is unimportant:

$ pm_1203774_file_io_basics.pl 26 24 25
'pm_1203774_transient_data' not found.
'pm_1203774_transient_data' exists.
Record 24: X
Record 25: Y
Record 26: Z
No. of records: 26
'pm_1203774_transient_data' exists.
'pm_1203774_transient_data' not found.
[download]

Out-of-range record numbers and non-numeric arguments are not processed; they are, however, reported on STDERR:

$ pm_1203774_file_io_basics.pl A B C 26 1 27 0 2 garbage
'pm_1203774_transient_data' not found.
'pm_1203774_transient_data' exists.
Record 1: A
Record 2: B
Record 26: Z
No. of records: 26
Problem records: 0 27 A B C garbage
'pm_1203774_transient_data' exists.
'pm_1203774_transient_data' not found.
[download]

I then, out of curiousity, took a look at this Wikipedia GEDCOM entry. You have a problem with your terminology which is highly likely to translate into problems in your code. You're using the terms "lines" and "records" interchangeably: in many cases that equivalency exists; however, the GEDCOM format uses multiline records (i.e. "lines" and "records" are not the same thing).

To demonstrate a technique you could use to read GEDCOM records, I copied "sample.ged" (from that Wikipedia article) to "pm_1203774_sample.ged", and parsed it like so:

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

my $filename = 'pm_1203774_sample.ged';

{
    my $start_char = '0';
    local $/ = "\n$start_char";

    open my $fh, '<', $filename;

    while (<$fh>) {
        chomp;
        $_ = $start_char . $_ unless $. == 1;
        $_ .= "\n" unless eof;
        print "Record #$.\n";
        print;
    }
}
[download]

Which outputs:

Record #1
0 HEAD
1 SOUR PAF
2 NAME Personal Ancestral File
2 VERS 5.0
1 DATE 30 NOV 2000
1 GEDC
2 VERS 5.5
2 FORM LINEAGE-LINKED
1 CHAR ANSEL
1 SUBM @U1@
Record #2
0 @I1@ INDI
1 NAME John /Smith/
1 SEX M
1 FAMS @F1@
Record #3
0 @I2@ INDI
1 NAME Elizabeth /Stansfield/
1 SEX F
1 FAMS @F1@
Record #4
0 @I3@ INDI
1 NAME James /Smith/
1 SEX M
1 FAMC @F1@
Record #5
0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 MARR
1 CHIL @I3@
Record #6
0 @U1@ SUBM
1 NAME Submitter
Record #7
0 TRLR
[download]

Adapting that code, for use in my first script, is left as an exercise for your good self. Of course, if you really get stuck on something, come back and ask another question.

— Ken

Comment on Re: How to read a GEDCOM file Select or Download Code

Replies are listed 'Best First'.
Re^2: How to read a GEDCOM file by afoken (Chancellor) on Nov 21, 2017 at 05:05 UTC
if you really get stuck on something, come back and ask another question. ... preferably in this thread, so that we don't have to start at zero again. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]