JillB has asked for the wisdom of the Perl Monks concerning the following question:

Hello, this is my first ever go at Perl. For starters, I want to open a GEDCOM text file and count the number of lines in the file, and read a line. The script below does not work for me.It shows the count as 1, and no record is shown for the 10th record Can anyone please assist?

#!/usr/bin/perl open(MYFILE, "C:/Users/Jill/Documents/Genealogy/birdt.ged") || die; @MyGed=birdt.ged; $count=@MyGed; print "10th record : $MyGed[10]\n"; print "No. of records : $count \n";

Replies are listed 'Best First'.
Re: How to read a GEDCOM file
by haukex (Archbishop) on Nov 19, 2017 at 12:37 UTC

    I don't know anything about this format, but CPAN is your friend, there is a module Gedcom that sounds like it will be able to help you.

    Note that your Perl source contains some invalid syntax*. Since you say you're just starting with Perl, I suggest starting with perlintro (which includes the tip to Use strict and warnings).

    It might be easiest to start with a bit simpler task, like reading the 10th line of a simple text file. <update> Once you get to the section "Files and I/O" of perlintro, you should be able to see what to change in your current script for it to read the lines from the file. </update>

    Also, the Basic debugging checklist contains some good tips, like to use Data::Dumper or Data::Dump to look at data structures, which will probably be helpful later on in inspecting the data that the module returns.

    We'll be happy to help with any questions you may have while learning. For asking questions, it's always best to include the code you are working on, with things irrelevant to the question removed (although the code should at least still compile, see SSCCE), short sample input data, the expected output for that input, and the actual output you're getting, including any error messages (with line numbers intact). See also How do I post a question effectively?

    * Actually, you've since edited your post to fix some of that. Please mark your updates as such, to prevent replies from being confusing - see How do I change/delete my post?

    Update: Typo fix: Data::Dumper, not "Date::Dumper", thanks 1nickt!

      Thanks for this advice. It should help me make better use of your Forum

Re: How to read a GEDCOM file
by Laurent_R (Canon) on Nov 19, 2017 at 13:13 UTC
    There are quite a few problems in your code, but the main one is that you open the file, but never actually read it, which you could do with something like this:
    @MyGed = <MYFILE>;
    That should probably solve your problem and make your script work. I would suggest, however, that you rewrite the script in accordance with commonly accepted best practices. For example as follows:
    #!/usr/bin/perl use strict; use warnings; my $file_in = "C:/Users/Jill/Documents/Genealogy/birdt.ged"; open my $FILE, "<", $file_in or die "could not open $file_in $!"; # t +hree-argument open syntax my @ged = <$FILE>; # s +lurp the file contents into @ged my $count = $.; # or, as you had it: my $count = @ged; print "10th record : $ged[10]\n"; # n +ote that $ged[10] is really the 11th record (array subscripts start a +t 0) print "No. of records : $count \n";

      Thanks you for this quick and very helpful reply. It now works perfectly with your code, and I have learnt lessons from your post regards, Jill

        It's clear from this node (if the janitors haven't already tidied it away) that you haven't yet quite gotten the hang of editing your posts. :) Please see How do I change/delete my post? for site etiquette and protocol regarding changing your posts. Bottom line: Don't Destroy Context!


        Give a man a fish:  <%-{-{-{-<

Re: How to read a GEDCOM file
by kcott (Archbishop) on Nov 20, 2017 at 22:09 UTC

    G'day JillB,

    "Hello, this is my first ever go at Perl."

    As has alreay been pointed out, there's quite a few problems there; and you've received good advice on dealing with these.

    I was putting together a short script to show how to do this without slurping entire files into arrays (which can often be problematic when large files chew up lots of memory). Additionally, I included code to create a temporary test file before processing and to delete it afterwards; also, there's a routine to check for the existence of that file. I've ended up with "pm_1203774_file_io_basics.pl" which covers many of the basic aspects of I/O and file handling.

    #!/usr/bin/env perl use strict; use warnings; use autodie; my %wanted_records = map { $_ => 1 } @ARGV; my $filename = 'pm_1203774_transient_data'; check_temporary_file($filename); create_temporary_file($filename); check_temporary_file($filename); process_temporary_file($filename, \%wanted_records); check_temporary_file($filename); delete_temporary_file($filename); check_temporary_file($filename); sub create_temporary_file { my ($file_to_create) = @_; open my $out_fh, '>', $file_to_create; print $out_fh "$_\n" for 'A' .. 'Z'; return; } sub process_temporary_file { my ($file_to_process, $wanted_records_ref) = @_; my $total_records = 0; { open my $in_fh, '<', $file_to_process; while (<$in_fh>) { next unless delete $wanted_records_ref->{$.}; print "Record $.: $_"; } $total_records = $.; } print "No. of records: $total_records\n"; my @problem_records = sort keys %$wanted_records_ref; if (@problem_records) { warn "Problem records: @problem_records\n"; } return; } sub delete_temporary_file { my ($file_to_delete) = @_; unlink $file_to_delete; return; } sub check_temporary_file { my ($file_to_check) = @_; if (-e $file_to_check) { print "'$file_to_check' exists.\n"; } else { print "'$file_to_check' not found.\n"; } return; }

    Here's some sample runs. Firstly, with no arguments, just the record count is reported:

    $ pm_1203774_file_io_basics.pl 'pm_1203774_transient_data' not found. 'pm_1203774_transient_data' exists. No. of records: 26 'pm_1203774_transient_data' exists. 'pm_1203774_transient_data' not found.

    Arguments specify the record numbers you want to print:

    $ pm_1203774_file_io_basics.pl 1 2 3 'pm_1203774_transient_data' not found. 'pm_1203774_transient_data' exists. Record 1: A Record 2: B Record 3: C No. of records: 26 'pm_1203774_transient_data' exists. 'pm_1203774_transient_data' not found.

    The order of arguments is unimportant:

    $ pm_1203774_file_io_basics.pl 26 24 25 'pm_1203774_transient_data' not found. 'pm_1203774_transient_data' exists. Record 24: X Record 25: Y Record 26: Z No. of records: 26 'pm_1203774_transient_data' exists. 'pm_1203774_transient_data' not found.

    Out-of-range record numbers and non-numeric arguments are not processed; they are, however, reported on STDERR:

    $ pm_1203774_file_io_basics.pl A B C 26 1 27 0 2 garbage 'pm_1203774_transient_data' not found. 'pm_1203774_transient_data' exists. Record 1: A Record 2: B Record 26: Z No. of records: 26 Problem records: 0 27 A B C garbage 'pm_1203774_transient_data' exists. 'pm_1203774_transient_data' not found.

    I then, out of curiousity, took a look at this Wikipedia GEDCOM entry. You have a problem with your terminology which is highly likely to translate into problems in your code. You're using the terms "lines" and "records" interchangeably: in many cases that equivalency exists; however, the GEDCOM format uses multiline records (i.e. "lines" and "records" are not the same thing).

    To demonstrate a technique you could use to read GEDCOM records, I copied "sample.ged" (from that Wikipedia article) to "pm_1203774_sample.ged", and parsed it like so:

    #!/usr/bin/env perl use strict; use warnings; use autodie; my $filename = 'pm_1203774_sample.ged'; { my $start_char = '0'; local $/ = "\n$start_char"; open my $fh, '<', $filename; while (<$fh>) { chomp; $_ = $start_char . $_ unless $. == 1; $_ .= "\n" unless eof; print "Record #$.\n"; print; } }

    Which outputs:

    Record #1 0 HEAD 1 SOUR PAF 2 NAME Personal Ancestral File 2 VERS 5.0 1 DATE 30 NOV 2000 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR ANSEL 1 SUBM @U1@ Record #2 0 @I1@ INDI 1 NAME John /Smith/ 1 SEX M 1 FAMS @F1@ Record #3 0 @I2@ INDI 1 NAME Elizabeth /Stansfield/ 1 SEX F 1 FAMS @F1@ Record #4 0 @I3@ INDI 1 NAME James /Smith/ 1 SEX M 1 FAMC @F1@ Record #5 0 @F1@ FAM 1 HUSB @I1@ 1 WIFE @I2@ 1 MARR 1 CHIL @I3@ Record #6 0 @U1@ SUBM 1 NAME Submitter Record #7 0 TRLR

    Adapting that code, for use in my first script, is left as an exercise for your good self. Of course, if you really get stuck on something, come back and ask another question.

    — Ken

      if you really get stuck on something, come back and ask another question.

      ... preferably in this thread, so that we don't have to start at zero again.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)