in reply to Splitting file into separate files based on record lengths and identifiers

What have you tried? What didn't work? Demonstrated effort is generally appreciated around the monastery - see How do I post a question effectively?.

You could do this quite compactly using embedded Perl code in a regular expression, but given that you are a "complete novice", a simpler algorithm of wrapping a substitution in a while loop would likely be sufficient. I assume you know how to open files and print output, so I will give you a simple demonstration of how you might implement your algorithm using regular expressions.

#!/usr/bin/perl use strict; use warnings; my $input = '0004$ADAM0002*330004%19770004$BOB 0002*430004%1967'; while( length $input ) { unless ($input =~ s/^(\d+)(.)//) { die "Input misformatted: $input"; } my $len = $1; my $type = $2; unless ($input =~ s/^(.{$len})//) { die "Input misformatted: $len, $type, $input"; } my $record = $1; print "Type:\t<$type>\nRecord:\t<$record>\n\n"; }

outputs:

Type: <$> Record: <ADAM> Type: <*> Record: <33> Type: <%> Record: <1977> Type: <$> Record: <BOB > Type: <*> Record: <43> Type: <%> Record: <1967>

If you have questions about how this works, I'd be happy to expound, though you should be able to find any answer in perlre and/or perlretut.

Replies are listed 'Best First'.
Re^2: Splitting file into separate files based on record lengths and identifiers
by monty77 (Initiate) on Aug 25, 2010 at 22:14 UTC

    Oh and I'd love you to break out the logic of your script, would much rather understand than blindly copy!

    Thanks!

      The most fundamental difference between my code and that posted in Re: Splitting file into separate files based on record lengths and identifiers is the use of ^ in my regular expressions. ^ requires that the match start at the beginning of the string. Unlike some regular expression implementations (Java comes to mind), without that anchor Perl can begin a match anywhere in the string. The likely issue you are having with your posted code is that your order of operations is getting messed up because you may start in the middle of your string. My approach requires that you process the string in order.