Shalmaneser has asked for the wisdom of the Perl Monks concerning the following question:

The data:
https://www.dropbox.com/s/hj6myrr9yqk5uu7/data.txt?dl=0
I need to concatenate records that match the first field - it's delimited by space. Data is sorted and can have instances of 1-5 records to be concatenated. I.E.
123 abc 123 32434 123 sdfd becomes 123 abc 123 32434123 sdfd
I speak UNIX C and Progress RDBMS but am curious about Perl's abilities. Invoking the code can be as simple as type infile > perl proc.pl > outfile or perl proc.pl infile outfile dealer's choice, but flexibility is valued.

Replies are listed 'Best First'.
Re: Concatenate records
by toolic (Bishop) on Dec 23, 2014 at 03:17 UTC
    • Read perlintro.
    • Write some code.
    • Post back here with specific questions, if needed.
Re: Concatenate records
by GrandFather (Saint) on Dec 23, 2014 at 06:17 UTC

    Your "specification" is a bit vague and the exercise seems rather light, but here's a sketch to give you the flavour of Perl:

    while (<>) { my ($prefix, $tail) = /(\S+) (.*)/; print "\n" if $currPrefix && $currPrefix ne $prefix; print "$prefix $tail"; $currPrefix = $prefix; }

    which works without change for both redirected input and input files with output to stdout in both cases. However a little more work is required to correctly handle the perl proc.pl infile outfile variant:

    #!/usr/bin/perl use strict; use warnings; my $in = *STDIN; my $out = *STDOUT; if (@ARGV == 2) { open $in, '<', $ARGV[0] or die "Can't open '$ARGV[0]': $!\n"; open $out, '>', $ARGV[1] or die "Can't create '$ARGV[1]': $!\n"; } my $currPrefix; while (<$in>) { my ($prefix, $tail) = /(\S+) (.*)/; print $out "\n" if $currPrefix && $currPrefix ne $prefix; print $out "$prefix $tail"; $currPrefix = $prefix; }

    which works as expected using redirection or infile/outfile.

    An interesting variant is:

    #!/usr/bin/perl use strict; use warnings; my $currPrefix; $^I = '.bak'; while (<>) { my ($prefix, $tail) = /(\S+) (.*)/; print "\n" if $currPrefix && $currPrefix ne $prefix; print "$prefix $tail"; $currPrefix = $prefix; }

    which will in place edit a list of input files generating backups of the original files with '.bak' extensions.

    However, if you are dealing with databases then you are heading into territory where Perl shines so if you like, ask a similar question involving databases.

    Perl is the programming world's equivalent of English