bulgin24 has asked for the wisdom of the Perl Monks concerning the following question:

Hello. I'm pretty sure this can be a snap for perl. I have multiple files with the following types of text data from which I must iterate the data and expand the content into a new file. So, for example, here is a snippet of data (lines without parentheses address data and their associated city/state/zip should be ignored): turn this:

432 10TH ST APT (Range 2A - 2B) BROOKLYN NY 10598-6601 432 10TH ST APT (Range 3A - 3B) BROOKLYN NY 10598-6601 432 10TH ST APT (Range 4A - 4B) BROOKLYN NY 10598-6605 432 10TH ST APT (Range 5A - 5D) BROOKLYN NY 10598-6605 432 10TH ST APT 6A BROOKLYN NY 10598-6605
into this:
432 10TH ST, APT 2A, BROOKLYN NY 10598-6601 432 10TH ST, APT 2B, BROOKLYN NY 10598-6601 432 10TH ST, APT 3A, BROOKLYN NY 10598-6601 432 10TH ST, APT 3B, BROOKLYN NY 10598-6601 432 10TH ST, APT 4A, BROOKLYN NY 10598-6605 432 10TH ST, APT 4B, BROOKLYN NY 10598-6605 432 10TH ST, APT 5A, BROOKLYN NY 10598-6605 432 10TH ST, APT 5B, BROOKLYN NY 10598-6605 432 10TH ST, APT 5C, BROOKLYN NY 10598-6605 432 10TH ST, APT 5D, BROOKLYN NY 10598-6605

Replies are listed 'Best First'.
Re: Extract data from txt file
by davido (Cardinal) on Nov 30, 2019 at 05:13 UTC

    Well, it's been seven hours since the OP was last seen here, so at this point I can take a crack at it for fun without feeling like I'm doing free work.

    #!usr/bin/env perl use strict; use warnings; sub is_street {return shift =~ m/^\d+/;} sub is_postal {return shift =~ m/^\w+.+\d$/;} sub street_components { my $address = shift; if ($address =~ m/^ (.+) \s+APT\s+ # APT anchor (?:\(Range\s+)? # Range syntax ([\w\d]+(?:\s+-\s+[\w\d]+)?) # Apartment number \)? # Closing range syntax $/x ) { return {street => $1, apartment => $2} } else { die "Street address match failure: <<$address>>\n"; } } sub postal_component {return shift} sub apartment_expand { my $apartment_range = shift; my ($low, $high) = split /\s*-\s*/, $apartment_range; return [$low] if !length($high); my ($low_num, $low_alpha ) = $low =~ m/^(\d+)(\w+)$/; my ($high_num, $high_alpha) = $high =~ m/^(\d+)(\w+)$/; my @return; foreach my $num ($low_num .. $high_num) { # Numeric +increment. foreach my $letter ($low_alpha .. $high_alpha) { # Alpha in +crement. push @return, "${num}${letter}"; } } return \@return; } my %record; while (my $line = <DATA>) { chomp $line; next unless length $line; $record{'addr'} = street_components($line) if is_street($line); $record{'postal'} = postal_component($line) if is_postal($line); if (exists $record{'addr'} && exists $record{'postal'}) { my $apartments = apartment_expand($record{'addr'}->{'apartment +'}); foreach my $apartment (@$apartments) { printf "%s, APT %s, %s\n" => $record{'addr'}->{'street'}, $apartment, $record{'postal'}; } undef %record; } } __DATA__ 432 10TH ST APT (Range 2A - 2B) BROOKLYN NY 10598-6601 432 10TH ST APT (Range 3A - 3B) BROOKLYN NY 10598-6601 432 10TH ST APT (Range 4A - 4B) BROOKLYN NY 10598-6605 432 10TH ST APT (Range 5A - 5D) BROOKLYN NY 10598-6605 432 10TH ST APT 6A BROOKLYN NY 10598-6605

    This produces the following output:

    432 10TH ST, APT 2A, BROOKLYN NY 10598-6601 432 10TH ST, APT 2B, BROOKLYN NY 10598-6601 432 10TH ST, APT 3A, BROOKLYN NY 10598-6601 432 10TH ST, APT 3B, BROOKLYN NY 10598-6601 432 10TH ST, APT 4A, BROOKLYN NY 10598-6605 432 10TH ST, APT 4B, BROOKLYN NY 10598-6605 432 10TH ST, APT 5A, BROOKLYN NY 10598-6605 432 10TH ST, APT 5B, BROOKLYN NY 10598-6605 432 10TH ST, APT 5C, BROOKLYN NY 10598-6605 432 10TH ST, APT 5D, BROOKLYN NY 10598-6605 432 10TH ST, APT 6A, BROOKLYN NY 10598-6605

    It's unfortunate that the data lacks a record separator; that means you have to keep track of what state you are in. If it mattered, I'd do more detection of getting out of sync by verifying we didn't get a city before getting an address.

    If one were to use this for anything more than amusement they would quickly discover how fragile the address detection is, and that would lead to a realization of how unfortunate the input data format is.


    Dave

      $/ = ''; $\ = "\n"; $, = ", "; while (1) { local $_ = <> or last; s/\s+$//; my $y = <> or last; $y =~ s/\s+$//; if ( /(.+) (APT \d.*)/ ) { print $1, $2, $y; } elsif ( /(.+) (APT) \(Range (\d+)([A-Z]+) - \3([A-Z]+)\)/ ) { print $1, "$2 $3$_", $y for $4 .. $5; } }
      I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.
      "...seven hours..."

      Nice code. But it is like it is:

      "Ihr sollt das Heilige nicht den Hunden geben und eure Perlen sollt ihr nicht vor die Säue werfen, damit die sie nicht zertreten mit ihren Füßen und sich umwenden und euch zerreißen.“ (Matthäus 7,6 LUT)".

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

        Same book. Same page even.
        Richtet nicht, auf daß ihr nicht gerichtet werdet. 2Denn mit welcherlei Gericht ihr richtet, werdet ihr gerichtet werden; und mit welcherlei Maß ihr messet, wird euch gemessen werden.
        And
        7Bittet, so wird euch gegeben; suchet, so werdet ihr finden; klopfet an, so wird euch aufgetan. 8Denn wer da bittet, der empfängt; und wer da sucht, der findet; und wer da anklopft, dem wird aufgetan.


        holli

        You can lead your users to water, but alas, you cannot drown them.
Re: Extract data from txt file -- oneliner
by Discipulus (Canon) on Nov 30, 2019 at 14:15 UTC
    Hello bulgin24,

    here a oneliner version (that skips tuple without range specification). Pay attention to windows double quotes

    perl -nle "$x++ if 0==$.%4;$a[$x].=$_}{for(@a){next unless/Range/;s/ST +/ST,/;/\s(\d+)(\w) - \d+(\w)\W/;$n=$1;for $l(qq($2)..qq($3)){print s/ +\(.*\)/$n$l, /r}}" add-data.txt 432 10TH ST, APT 2A, BROOKLYN NY 10598-6601 432 10TH ST, APT 2B, BROOKLYN NY 10598-6601 432 10TH ST, APT 3A, BROOKLYN NY 10598-6601 432 10TH ST, APT 3B, BROOKLYN NY 10598-6601 432 10TH ST, APT 4A, BROOKLYN NY 10598-6605 432 10TH ST, APT 4B, BROOKLYN NY 10598-6605 432 10TH ST, APT 5A, BROOKLYN NY 10598-6605 432 10TH ST, APT 5B, BROOKLYN NY 10598-6605 432 10TH ST, APT 5C, BROOKLYN NY 10598-6605 432 10TH ST, APT 5D, BROOKLYN NY 10598-6605

    The trick is to read 3 lines at time (2 interesting plus the empty one in the middle) to build a longer line. Then with capturing parens you can grab the range to expand. See it deparsed:

    perl -MO=Deparse -nle "$x++ if 0==$.%4;$a[$x].=$_}{for(@a){next unless +/Range/;s/ST/ST,/;/\s(\d+)(\w) - \d+(\w)\W/;$n=$1;for $l(qq($2)..qq($ +3)){print s/\(.*\)/$n$l, /r}}" add-data.txt BEGIN { $/ = "\n"; $\ = "\n"; } LINE: while (defined($_ = readline ARGV)) { chomp $_; ++$x if 0 == $. % 4; $a[$x] .= $_; } { foreach $_ (@a) { next unless /Range/; s/ST/ST,/; /\s(\d+)(\w) - \d+(\w)\W/; $n = $1; foreach $l ("$2" .. "$3") { print s/\(.*\)/$n$l, /r; } } } -e syntax OK

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Extract data from txt file
by NetWallah (Canon) on Nov 30, 2019 at 04:45 UTC
    You could use String::Range::Expand to accomplish this.

                    "From there to here, from here to there, funny things are everywhere." -- Dr. Seuss

Re: Extract data from txt file
by Marshall (Canon) on Nov 30, 2019 at 02:31 UTC
    Your program spec has some flaws related to ranges: 5A-5D would normally mean 5A, 5B, 5C, 5D
    IN: 432 10TH ST APT (Range 3A - 3B) OUT: 432 10TH ST, APT 3A, BROOKLYN NY 10598-6601 OUT: 432 10TH ST, APT 3B, BROOKLYN NY 10598-6601 IN: 432 10TH ST APT (Range 4A - 4B) OUT: 432 10TH ST, APT 4A, BROOKLYN NY 10598-6605 OUT: 432 10TH ST, APT 4B, BROOKLYN NY 10598-6605 IN: 432 10TH ST APT (Range 5A - 5D) ***???*** OUT: 432 10TH ST, APT 5A, BROOKLYN NY 10598-6605 OUT: 432 10TH ST, APT 5B, BROOKLYN NY 10598-6605 ??? OUT: 432 10TH ST, APT 5C, BROOKLYN NY 10598-6605 ??? OUT: 432 10TH ST, APT 5D BROOKLYN NY 10598-6605 IN: 432 10TH ST APT 6A OUT: 432 10TH ST APT 6A, BROOKLYN NY 10598-6605
      so ... what's the problem?
Re: Extract data from txt file
by Anonymous Monk on Nov 29, 2019 at 22:12 UTC
    What did you try? PM is not a code writing service.