Record separator question

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Record separator question by Limbic~Region (Chancellor) on Jan 20, 2004 at 23:40 UTC
Chris, Take a look at perldoc perlvar. Remember: the value of $/ is a string, not a regex. awk has to be better for something. :-) Ok - so my suggestion would be to slurp in the entire thing and then split on your regex. `local $/ = undef; my $record = <INPUT>; my @chunks = split /regex/ , $record; for my $chunk ( @chunks ) { my @lines = split /\n/ , $chunk; }` [download] Hope that helps - L~R Update: If for some reason you need to keep the header information - do something like this: Read more... (398 Bytes)	[reply] [d/l] [select]
Re: Record separator question by Zaxo (Archbishop) on Jan 20, 2004 at 23:50 UTC
If you define `{ local $/ = '===='; open local(*FILE1), '<', shift or die $!;` [download] then you can discard a first (empty) read, and get the header info and data in two further reads, `local $_ = <FILE1>; while (<FILE1>) { my @header_data = header_extract($_); my $data = <FILE1>; # second read my @data = data_extract($data); # ... } close DATA1 or die $!; redo if @ARGV; }` [download] After Compline, Zaxo	[reply] [d/l] [select]
Re: Record separator question by Roger (Parson) on Jan 21, 2004 at 00:18 UTC
I would keep my (business) logic as simple as possible: `use strict; use warnings; while ($_ = getrecord()) { print "----RECORD----\n$_\n"; } my $saved_header; sub getrecord { my $text = $saved_header; $saved_header = ''; while (<DATA>) { if (/^====[^=]+====$/) { $saved_header = $_, last if $text; } $text .= $_; } return $text; } __DATA__ ====header info==== 0 10 to 50 line of text and numbers with ==== irregular ==== formatting ====header info==== 10 to 50 lines of text and...` [download]	[reply] [d/l]
Re: Record separator question by Cody Pendant (Prior) on Jan 20, 2004 at 23:56 UTC
I tend to do stuff like this by thinking of two "modes". If $_ =~ m/^=====/ then we're in header_mode. If not, we're in other_mode. In header_mode, we do X, in other_mode we do Y. It's kind of clunky, but it sorts out the logic wonderfully in my head. ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print	[reply]
Re: Record separator question by graff (Chancellor) on Jan 21, 2004 at 02:43 UTC
Bear in mind that when $/ is some particular string, perl reads input data up to and including that string (or until EOF, if the string does not occur). The record separator, whatever it is, is retained as the last component of the record just read in. Note also that "chomp" works be removing $/ from the end of a string (if the string happens to contain $/ at the end). Given that records in your file begin with the line "====header info====\n", you could set this whole string as your record separator, and just accept the fact that the first "record" you read in will contain nothing else but this line, and that all subsequent records will have this line as the end of the record string, not the beginning. Something like the following would do what you want, assuming that you're okay with actually removing these record separators and keeping just the stuff in between: `$/ = "====header info====\n"; while (<>) { chomp; next unless ( /\S/ ); .... }` [download] (I tried this out on your sample text, and it did the right thing, even with the last record, which did not have "====header info====" at the end.)	[reply] [d/l]
Re: Record separator question by duff (Parson) on Jan 21, 2004 at 03:58 UTC
I'm not exactly sure what you want because I can interpret your text several ways. Here's a couple of snippets of code though. The first assumes that you really want something like this: `====header info==== header header ====header info==== data data ====header info==== header header ====header info==== data ....` [download] ... and so on. While the second assumes that you just want the stuff that's in between the `====header info====` lines while discarding the header lines themselves. The second one is what I believe most people interpret your text to mean, but I thought I'd mention the first one just in case (also, it's a rare chance that I get to use the `...` (yes, that's 3 dots!) flip-flop operator ;-) `#!/usr/bin/perl # snippet number 1 while (<DATA>) { if (/^====header/.../^====header/) { print "header: $_"; next; } print "data: $_"; next; } __DATA__ ====header info==== 10 to 50 line of text and numbers with irregular formatting ====header info==== 10 to 50 lines of text and... More text more text ====header info==== 10 to 50 line of text and numbers with irregular formatting ====header info==== 10 to 50 lines of text and... More text more text` [download] #!/usr/bin/perl # snippet number 2 my (@records,@tmp); while (<DATA>) { chomp; if (/^====header/) { next unless @tmp; push @records, [ @tmp ]; @tmp = (); next; } push @tmp, $_; } push @records, [ @tmp ] if @tmp; print "@$_\n" for @records; __DATA__ ====header info==== 10 to 50 line of text and numbers with irregular formatting ====header info==== 10 to 50 lines of text and... More text more text ====header info==== 10 to 50 line of text and numbers with irregular formatting ====header info==== 10 to 50 lines of text and... More text more text [download] duff	[reply] [d/l] [select]