in reply to Array Exact Position Filter

I had a hard time trying to explain this, i will look into your examples and try to implement it into my code. Everything after Campo_Identificacao is fixed, only changing the values in ###VALUE### - and the description of the process comes before Campo_Identificacao

<HR>Campo_Identificacao(###NAME OF THE PROCESS### nº ###NUMBER OF THE +PROCESS###, ###THE PLACE THAT JUDGED IT###/AC, Rel. ###NAME OF THE JU +DGE###. j. ###JUDGEMENT DAY### , Publ. ###PUBLICATION DAY### .<FD:AnoJulg>2018</FD:AnoJulg>).<HR><PS:Id +entificacao>

The problem i had was that i wanted to use the NUMBER OF THE PROCESS to filter and look for duplicates, because there could be repeated process descriptions, but the number of the process itself could never repeat, biggest problem was that the number could be inserted into the description, so i would need to pinpoint its exact position BETWEEN NAME OF THE PROCESS nº and , THE PLACE THAT JUDGED so icould verify it and delete the entire line if it was found afterwards. Many thanks guys, have a great day!!!

Replies are listed 'Best First'.
Re^2: Array Exact Position Filter
by AnomalousMonk (Archbishop) on Aug 03, 2018 at 00:59 UTC

    I still don't really understand your requirement, but here's a different approach. In my previous solution, the absolute offset of the start of the number field in the line had always to be the same if subsequent lines were to be rejected once a valid number line was detected. In this solution, the offset between the end of the  $toss_tag substring and the start of the number field must remain the same, wherever the toss tag happens to be in the string. So in the following string:

    +----------------------------------- any number of any characters | before toss-tag field | | +---------------------------- toss-tag field (defined | | string, constant width) | | | | +--------------------- any number of any non-digits | | | between end of toss-tag | | | and start of number field. | | | characters can vary after | | | first field seen, but width | | | cannot vary once set. | | | | | | +----------- number field. number cannot | | | | vary once set. | | | | | | | | +---- any number of any characters | | | | | after number field | | | | | /---+----\/+\/----+----\/---+---\/-+-----------------------\ 'line 2 xx Foo NineChars 1234-12.3 other 123 first valid keep'
    I hope that makes sense. (I also slightly simplified some of the field capture logic, but that shouldn't matter.) Tested under Perl version 5.8.9, so it should work under any later version. I hope this will be of some help.


    Give a man a fish:  <%-{-{-{-<

Re^2: Array Exact Position Filter
by hippo (Archbishop) on Aug 03, 2018 at 08:26 UTC

    Now that we know that the exact position of the key field does in fact change, that restriction is removed and all you need to do is find a suitably unique regex to capture it in the first place. Looks to me from

    <DI:"def.def"><RD:Item><PS:Item>Campo_Ementa ###HERE GOES THE TEXT THA +T I DONT WANT TO VERIFY[a-zA-Zetcetc]*###<HR>Campo_Identificacao(Recu +rso Inominado nº <b>0702995-42.2017.8.01.0002</b>, 1ª Turma Recursal +dos Juizados Especiais/AC, Rel. Maria Rosinete dos Reis Silva. j. 11. +07.2018 , Publ. 20.07.<FD:AnoJulg>2018</FD:AnoJulg>).<HR><PS:Identifi +cacao>

    that it exists between "nº " and ", " and if that's the same for all of them then it is a simple switch:

    use strict; use warnings; use utf8; use Test::More tests => 1; my @in = split (/\n/, <<EOT); aaa nº 1234, xyz bbb nº 1234, yzx ccc nº 1234, yxz ddd nº 2345, zyx EOT my @want = split (/\n/, <<EOT); aaa nº 1234, xyz ddd nº 2345, zyx EOT my $number; my %seen = ( '' => 1 ); my @out; for my $line (@in) { if ($line =~ /nº ([\d]*),/) { $number = $1; } else { $number = ''; } push @out, $line unless $seen{$number}++; } is_deeply (\@out, \@want);

    If that logic doesn't match your requirements, simply change the regex until it does. Obviously here, I'm just matching integers for the numbers and yours is more complex but the principle should still fit.