Re: Remove line and modify another

Hi all.

Thank you for your assistance. As requested by one of the replies, here is some of the code

    s/  <datafield tag="/./i;        #begin tag num with "."
    s/" ind1="/. /i;            #end tag num with ".  "
    s/" ind2="//i;                #remove text between indic
    s/">\n//i;                #to imprecise?  No, its OK.
    s/    <subfield code="/|/i;        #remove text between fields
    s/">//i;                #to imprecise?  No, its OK.
    s/<\/subfield>\n//i;            #remove text between fields
    s/<\/record>\n//i;            #remove record end marker
    s/<\/datafield>//i;            #remove end of field marker
    s/http:\/\/wc.slims.gov.za/WCPLIS/i;    #change SITA url to org
    s/<\/collection>\n//i;            #remove end of document
    if (/\d-\d/) {s/-//g};            #Remove hyphens in ISBN's
[download]

This part takes the xml code and puts it into the flat format as required. Below is a sample of the xml code

  <datafield ind1="1" ind2="4" tag="245">
    <subfield code="a">The solar system</subfield>
    <subfield code="c">Chris Oxlade [Author]</subfield>
  </datafield>
[download]

Below is the output of one line in the flat file.

 .245. 14|aThe solar system|cChris Oxlade [Author]
[download]

This perl code is user for all lines with various MARC tags (.024.). It might be possible that I am putting the part to focus on the ISBN correction in the wrong place in the code. I used the line below:

    if (/^.024. 3#|a/) {s/.{10}(.............)........../.020.  3#|a$1
+/};
[download]

I will try some of the suggestions and see if I can get it sorted. Thank you once again for assisting.

Comment on Re: Remove line and modify another Select or Download Code

Replies are listed 'Best First'.
Re^2: Remove line and modify another by poj (Abbot) on Sep 06, 2018 at 07:17 UTC
It might be possible that I am putting the part to focus on the ISBN correction in the wrong place in the code Consider using XML::Twig instead of regexes to process the file. Easier to apply changes to a data element before creating the flat file rather a complete line afterwards. #!/usr/bin/perl use strict; use XML::Twig; my $xml = join '',<DATA>; my $twig = XML::Twig->new( twig_handlers => {'datafield' => \&datafield} ); $twig->parse( $xml ); sub datafield { my( $t, $e ) = @_; my %subfield = (); for my $elem ($e->children('subfield')){ $subfield{$elem->att('code')} = $elem->text; } my @f = (); $f[0] = $e->att('tag'); $f[1] = $e->att('ind1').$e->att('ind2'); my @tmp; for (sort keys %subfield){ push @tmp,$_.$subfield{$_}; } $f[2] = join '\|',@tmp; # change if ($subfield{'x'} =~ /^(isbn13\|isbn)$/){ $f[0] =~ s/024/020/; } # flat format for output printf ".%s. %s\|%s\n",@f if ($f[2]); # skip blanks } #.245. 14\|aThe solar system\|cChris Oxlade [Author] #.024. 3#\|a9780750247092\|xisbn13 #.024. 3#\|a0750247096\|xisbn __DATA__ <collection> <record> <datafield ind1="1" ind2="4" tag="245"> <subfield code="a">The solar system</subfield> <subfield code="c">Chris Oxlade [Author]</subfield> </datafield> </record> <record> <datafield ind1="3" ind2="#" tag="024"> <subfield code="a">a9780750247092</subfield> <subfield code="x">isbn13</subfield> </datafield> <record> </record> <datafield ind1="3" ind2="#" tag="024"> <subfield code="a">a0750247096</subfield> <subfield code="x">isbn</subfield> </datafield> </record> <record> <datafield ind1="3" ind2="#" tag="024"> </datafield> </record> </collection> [download] poj	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Remove line and modify another
by poj (Abbot) on Sep 06, 2018 at 07:17 UTC

It might be possible that I am putting the part to focus on the ISBN correction in the wrong place in the code

Consider using XML::Twig instead of regexes to process the file. Easier to apply changes to a data element before creating the flat file rather a complete line afterwards.

#!/usr/bin/perl
use strict;
use XML::Twig;

my $xml = join '',<DATA>;
my $twig = XML::Twig->new( 
  twig_handlers => {'datafield' => \&datafield}
);          
$twig->parse( $xml );

sub datafield {
  my( $t, $e ) = @_;

  my %subfield = ();
  for my $elem ($e->children('subfield')){
    $subfield{$elem->att('code')} = $elem->text;
  }

  my @f = ();
  $f[0] = $e->att('tag');
  $f[1] = $e->att('ind1').$e->att('ind2');
  my @tmp;
  for (sort keys %subfield){
    push @tmp,$_.$subfield{$_};
  }
  $f[2] = join '|',@tmp;
  
  # change
  if ($subfield{'x'} =~ /^(isbn13|isbn)$/){
    $f[0] =~ s/024/020/;
  }
  
  # flat format for output
  printf ".%s. %s|%s\n",@f if ($f[2]); # skip blanks
}   

#.245. 14|aThe solar system|cChris Oxlade [Author] 
#.024. 3#|a9780750247092|xisbn13  
#.024. 3#|a0750247096|xisbn  
__DATA__
<collection>
<record>
  <datafield ind1="1" ind2="4" tag="245">
    <subfield code="a">The solar system</subfield>
    <subfield code="c">Chris Oxlade [Author]</subfield>
  </datafield>
</record>
<record>  
  <datafield ind1="3" ind2="#" tag="024">
    <subfield code="a">a9780750247092</subfield>
    <subfield code="x">isbn13</subfield>
  </datafield>
<record>
</record>
  <datafield ind1="3" ind2="#" tag="024">
    <subfield code="a">a0750247096</subfield>
    <subfield code="x">isbn</subfield>
  </datafield>
</record>
<record>
  <datafield ind1="3" ind2="#" tag="024">
  </datafield>
</record>
</collection>
[download]

[reply]
[d/l]