Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all. i'm a noob to perl and using perl in the library world. At the moment I am working on a file to change a xml file to a flat file. The current script work great however I need some assistance with two lines.

1) Remove line - Not sure if tiredness is catching me but I have this one line that does not want to disappear. The full line is: .024. 3# (two white spaces at the end) I need to remove the whole line.

2) Replace first 5 characters if line end with certain characters - These lines are the ISBN fields for the records and looks like: .024. 3#|a9780750247092|xisbn13 .024. 3#|a0750247096|xisbn If the line ens with |xisbn13 or |xisbn I need to replace the first characters (.024.) with .020.

My current script is very long and consists of general substitutes. I thank you in advance!

Replies are listed 'Best First'.
Re: Remove line and modify another
by hippo (Archbishop) on Sep 05, 2018 at 22:57 UTC

    Here's an SSCCE for your point 1:

    use strict; use warnings; use Test::More tests => 1; my $in = 'foo .024. 3# bar'; my $want = 'foo bar'; my $try = $in; $try =~ s/\.024\. 3# \n//; is ($try, $want);

    Now it's up to you to do the same for point 2. Can you?

Re: Remove line and modify another
by Marshall (Canon) on Sep 05, 2018 at 20:47 UTC
    Based upon your question, I don't know how to help you.

    Please do the work to post a simple code example that demonstrates your problem.
    I don't need your whole program, just the problematic parts.

    Update: you say: My current script is very long and consists of general substitutes. but you apparently cannot do some relatively simple stuff. Show some code.

Re: Remove line and modify another
by AnomalousMonk (Archbishop) on Sep 05, 2018 at 23:30 UTC

    Based on several guesses, here's another approach to an SSCCE:

    c:\@Work\Perl\monks>perl -wMstrict -e "my @lines = ( qq{.024. 3# keep this one \n}, qq{.024. 3# \n}, qq{.024. 3# keep this two \n}, qq{.024. 3#|a9780750247092|xisbn13\n}, qq{.024. 3# keep this three \n}, qq{.024. 3#|a0750247096|xisbn\n}, qq{.024. 3# keep this four \n}, ); print for @lines; print qq{\n}; ;; my $rx_first = qr{ [.] 024 [.] }xms; my $replace = '.020.'; ;; LINE: for my $line (@lines) { next LINE if $line =~ m{ \A $rx_first \s+ 3[#] [ ]{2} \Z }xms; $line =~ s{ \A $rx_first (?= .* [|]xisbn (?: 13)? \Z) } {$replace}xms; print $line; } " .024. 3# keep this one .024. 3# .024. 3# keep this two .024. 3#|a9780750247092|xisbn13 .024. 3# keep this three .024. 3#|a0750247096|xisbn .024. 3# keep this four .024. 3# keep this one .024. 3# keep this two .020. 3#|a9780750247092|xisbn13 .024. 3# keep this three .020. 3#|a0750247096|xisbn .024. 3# keep this four

    Update: But pay attention to the Test::More approach hippo has used here: it's something you should be using generally in your development.


    Give a man a fish:  <%-{-{-{-<

Re: Remove line and modify another
by IceJ (Initiate) on Sep 06, 2018 at 05:48 UTC

    Hi all.

    Thank you for your assistance. As requested by one of the replies, here is some of the code

    s/ <datafield tag="/./i; #begin tag num with "." s/" ind1="/. /i; #end tag num with ". " s/" ind2="//i; #remove text between indic s/">\n//i; #to imprecise? No, its OK. s/ <subfield code="/|/i; #remove text between fields s/">//i; #to imprecise? No, its OK. s/<\/subfield>\n//i; #remove text between fields s/<\/record>\n//i; #remove record end marker s/<\/datafield>//i; #remove end of field marker s/http:\/\/wc.slims.gov.za/WCPLIS/i; #change SITA url to org s/<\/collection>\n//i; #remove end of document if (/\d-\d/) {s/-//g}; #Remove hyphens in ISBN's

    This part takes the xml code and puts it into the flat format as required. Below is a sample of the xml code

    <datafield ind1="1" ind2="4" tag="245"> <subfield code="a">The solar system</subfield> <subfield code="c">Chris Oxlade [Author]</subfield> </datafield>

    Below is the output of one line in the flat file.

    .245. 14|aThe solar system|cChris Oxlade [Author]

    This perl code is user for all lines with various MARC tags (.024.). It might be possible that I am putting the part to focus on the ISBN correction in the wrong place in the code. I used the line below:

    if (/^.024. 3#|a/) {s/.{10}(.............)........../.020. 3#|a$1 +/};

    I will try some of the suggestions and see if I can get it sorted. Thank you once again for assisting.

      It might be possible that I am putting the part to focus on the ISBN correction in the wrong place in the code

      Consider using XML::Twig instead of regexes to process the file. Easier to apply changes to a data element before creating the flat file rather a complete line afterwards.

      #!/usr/bin/perl use strict; use XML::Twig; my $xml = join '',<DATA>; my $twig = XML::Twig->new( twig_handlers => {'datafield' => \&datafield} ); $twig->parse( $xml ); sub datafield { my( $t, $e ) = @_; my %subfield = (); for my $elem ($e->children('subfield')){ $subfield{$elem->att('code')} = $elem->text; } my @f = (); $f[0] = $e->att('tag'); $f[1] = $e->att('ind1').$e->att('ind2'); my @tmp; for (sort keys %subfield){ push @tmp,$_.$subfield{$_}; } $f[2] = join '|',@tmp; # change if ($subfield{'x'} =~ /^(isbn13|isbn)$/){ $f[0] =~ s/024/020/; } # flat format for output printf ".%s. %s|%s\n",@f if ($f[2]); # skip blanks } #.245. 14|aThe solar system|cChris Oxlade [Author] #.024. 3#|a9780750247092|xisbn13 #.024. 3#|a0750247096|xisbn __DATA__ <collection> <record> <datafield ind1="1" ind2="4" tag="245"> <subfield code="a">The solar system</subfield> <subfield code="c">Chris Oxlade [Author]</subfield> </datafield> </record> <record> <datafield ind1="3" ind2="#" tag="024"> <subfield code="a">a9780750247092</subfield> <subfield code="x">isbn13</subfield> </datafield> <record> </record> <datafield ind1="3" ind2="#" tag="024"> <subfield code="a">a0750247096</subfield> <subfield code="x">isbn</subfield> </datafield> </record> <record> <datafield ind1="3" ind2="#" tag="024"> </datafield> </record> </collection>
      poj