Re: Converting GEDCOM files

After examining the sample input and output files that Joes emailed to me, as well as some clarified rules, I think that I was actually quite close with my original script. Here's the final final (see below) version:

#!/usr/local/bin/perl -w

use strict;

$/  = "\n0";   # read in one record at a time
$^I = '.bak';  # modify input files in place, save originals with .bak

my %convert = (
  HEAL => 'Medical',
  HIST => 'Biography', 
  EDUC => 'Educated',
  RESI => 'Resided',
  OCCU => 'Occupation',
);

my $convert_re = join '|', keys %convert;
$convert_re = qr/\b($convert_re)\b/;

while (<>) {
  s/$convert_re/NOTE $convert{$1}:/g;

  if (/^.*SOUR/) {
    s/^1 NAME/1 TITL/m;
    s/^1 NAME/1 TEXT/m;
    s/^1 NAME/2 CONT/mg;
  }
} continue {
  print;
}
[download]

I made two changes. The input record separator is now "\n0", because each section begins with a line starting with 0. (Although this will have each "\n0" read in as the end of the previous record, that doesn't make a difference for our purposes.) In the substitutions for NAME, I added a beginning of line anchor and removed the word 'data' (turns out that bit was just a placeholder for actual data). Also I fixed my typo of TITLE to be TITL (everything is four letters in this format).

Update: Make that four changes. The change of NAME to TITL is only supposed to occur for sections that start 0 @Snnn@ SOUR, not for all sections. And I stupidly forgot the /m modifier on those regexes when I added the anchors. Thanks for the notice of the problems, tachyon! (Notwithstanding the suggestion that I'm reading in one line at a time; I'm actually reading in a block of lines ending with a newline followed by a zero, as I intended.)

Comment on Re: Converting GEDCOM files Download Code

Replies are listed 'Best First'.
Re: Re: Converting GEDCOM files by tachyon (Chancellor) on Jul 09, 2001 at 21:34 UTC
Hi chipmunk this does not work :o( Update Seems fixed now though. Really clever solution. I really like the "\n0" now that I understand it ;-) - a very neat way to chop the file into records, much nicer than the method I used. I have learned a very good use for $/ in a practical example - thanks. cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply]
Converting GEDCOM files by Joes (Acolyte) on Jul 10, 2001 at 16:28 UTC
Hi tachyon. 1. In the clever little script you did for me, I find that I should have asked for the GEDCOM line 1 CHAR IBM DOS to be changed to 1 CHAR ASCII Can you please show me how to insert an amendment in your script? 2. The standard for the GEDCOM names after each 0 @Inn@ INDI is in the next line: 1 NAME First_name Middle_name/SURNAME/ The first name syntax is as shown, with the first character in Upper case. The SURNAME between the forward slashes is in UPPERCASE, as shown. Unfortunately, in merging many GEDCOM files from various genealogists, we end up with sample names like: JOE/SLAVEN/ Joe McDonald/Slaven/ Joe McDONALD 'Macca'/SLAVEN/ /SLAVEN/ Joe Joe/SLAV.../ (this last one when the surname is illegible) Any chance of you please having a go at looking at each 1 NAME Firstname Middlename/SURNAME/ line and converting it to the standard syntax? 3. In selecting the name of the output file, how do I go about naming it as the original input file name filename.ged, but with PAF added, as in filenamePAF.ged My best wishes Joe, Townsville, Australia	[reply]
Re: Converting GEDCOM files by tachyon (Chancellor) on Jul 10, 2001 at 22:02 UTC
Done. Fun. Gotta run. See Re: Converting GEDCOM files cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply]
Re: Re: Converting GEDCOM files by Anonymous Monk on Jul 10, 2001 at 03:18 UTC
Thanks to chipmonk and tachyon. You guys are great - you make this site an really outstanding and professional Monastery, My appreciation and kind regards to you both. I has been a pleasure seeing tow professionals in action. Joe Townsville, Australia	[reply]