newbie regex question: substituting repeating occurences for different replacements

RuyLopez has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: newbie regex question: substituting repeating occurences for different replacements by gjb (Vicar) on Jul 08, 2003 at 12:12 UTC
Rather than using regexs for this job, I'd recommend to use Text::CSV or Text::CSV_XS to parse the lines of the file into a list and simply add the tags on an element by element basis. IMHO this is a cleaner approach from a software engineering point of view, but more importantly you're sure that CSV will be handled correctly in all cases. Hope this helps, -gjb-	[reply]
Minor addition by Purdy (Hermit) on Jul 08, 2003 at 13:39 UTC
Just to make a minor point (plus get my first post in for 2003! ;)), the original poster's data has semicolons as the separating character instead of the default comma, which can easily be specified in Text::CSV_XS's sep_char attribute in the new() method. Hope I'm not doing anyone's homework: #!/usr/bin/perl -w use strict; use Text::CSV_XS; my ( $csv, $xml ); $csv = Text::CSV_XS->new( { 'sep_char' => ';' } ); $xml = ''; while( <DATA> ) { chomp; if ( $csv->parse( $_ ) ) { my ( $line, $n, @fields, $field ); $line = '<row>'; $n = 1; @fields = $csv->fields(); foreach $field ( @fields ) { $line .= "<col$n>$field</col$n>"; $n++; } $xml .= $line . "</row>\n"; } else { print "parse() failed on this line: " . $csv->error_input() . +"\n"; # die? } } print $xml; __DATA__ a;b;c;d;e f;g;h;i j;k l m;n;o [download] Output: $ ./main.pl <row><col1>a</col1><col2>b</col2><col3>c</col3><col4>d</col4><col5>e</col5></row> <row><col1>f</col1><col2>g</col2><col3>h</col3><col4>i</col4></row> <row><col1>j</col1><col2>k</col2></row> <row><col1>l</col1></row> <row><col1>m</col1><col2>n</col2><col3>o</col3></row> There's probably an XML package out there should your required output become more complex or you want to guarantee that you are using a standardized and optimized solution. Peace, Purdy	[reply] [d/l]
(jeffa) 3Re: substituting repeating occurences ... (Text::CSV_XS / CGI.pm version) by jeffa (Bishop) on Jul 09, 2003 at 21:34 UTC
Here is another approach which uses Text::CSV_XS and, gulp, CGI: `use strict; use warnings; use Text::CSV_XS; use CGI::Pretty qw(-any); my $i; my $csv = Text::CSV_XS->new({sep_char => ';'}); while (<DATA>) { $i = 0; warn and next unless $csv->parse($_); print CGI::row(map eval "CGI::col@{[++$i]}('$_')", $csv->fields); } __DATA__ colContents;nextColContents;lastColContents colContents2;nextColContents2;lastColContents2 colContents3;nextColContents3;lastColContents3 a;b;c;d;e f;g;h;i j;k l m;n;o` [download] The `-any` pragma in CGI.pm is pretty nifty, you can actually use it to create XML. No guarantees on valid XML, of course. Just be sure and either append the CGI package name or use the OO interface (else Perl will complain that `main` has no such method). I was able to increment the `<colN>` tags by using an evil `eval` trick. That was the toughest part of this code. Drop that requirement and utilize CGI.pm's distributive shortcuts feature thingy: `use Text::CSV_XS; use CGI::Pretty qw(-any); my $csv = Text::CSV_XS->new({sep_char => ';'}); $csv->parse($_) and print CGI::row(CGI::col([$csv->fields])) while <DATA>;` [download] jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l] [select]
Re: newbie regex question: substituting repeating occurences for different replacements by dreadpiratepeter (Priest) on Jul 08, 2003 at 12:13 UTC
Why use a regexp? `use strict; my $str; while (<DATA>) { chomp; my @cols = split(/;/); for (my $i=1;$i<=@cols;$i++) { $cols[$i-1]="<col$i>$cols[$i-1]</col$i>"; } $str .= '<row>' . join('',@cols) . "</row>\n"; } print $str; exit; __DATA__ a;b;c;d;e f;g;h;i j;k l m;n;o` [download] prints: `<row><col1>a</col1><col2>b</col2><col3>c</col3><col4>d</col4><col5>e</ +col5></row> <row><col1>f</col1><col2>g</col2><col3>h</col3><col4>i</col4></row> <row><col1>j</col1><col2>k</col2></row> <row><col1>l</col1></row> <row><col1>m</col1><col2>n</col2><col3>o</col3></row>` [download] UPDATE: Doh! of course gjb is right. Text::CSV is a great module. While my code is better than using regexps, Text::CSV is better than using my code. -pete "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."	[reply] [d/l] [select]
Re: newbie regex question: substituting repeating occurences for different replacements by AcidHawk (Vicar) on Jul 08, 2003 at 14:01 UTC
This got me thinking... I often create xml but always with the first tag being different for each .. well where the tag `<row>` exists here I normally only need `<row>a</row> <row2>b</row2>` [download] So here is my stab at doing this with XML::Simple. Please comment on how I can refine this... but I do know it outputs what was asked for... ;-D `#! /usr/bin/perl use strict; use warnings; use XML::Simple; use Data::Dumper; my (%h); my $xs = XML::Simple->new( keeproot => 1, noattr => 1, noescape => 1); open (FILE, ">>./Test.xml") or die "Cannot create Test.xml: $!\n"; while (<DATA>) { delete $h{row}; my $count = 0; chomp; my @line = split(/;/); foreach my $var (@line) { $count++; my $tag = "col" . $count; $h{'row'}{$tag} = $var; } print Dumper(\%h); print FILE $xs->XMLout(\%h); } close(FILE); __DATA__ a;b;c;d;e f;g;h i; j` [download] ----- Of all the things I've lost in my life, its my mind I miss the most.	[reply] [d/l] [select]
(jeffa) 2Re: substituting repeating occurences ... (XML::Generator::DBI / DBD::AnyData version) by jeffa (Bishop) on Jul 09, 2003 at 22:05 UTC
Personally, i think you shouldn't concern yourself with incrementing XML tags (even though i played along and posted a solution myself). The first `<foo>` tag encountered is the first, and the second `<foo>` tag encountered is the second, and so on - as long as that's how you intend to read the data. XML::Simple handles this by exposing an option (`forcearray`). Since order will be preserved, why bother with incrementing the tags? (And besides, wouldn't that id number be better as an attribute instead?) I posted a few solutions to converting CSV to XML over at CSV to XML (the quick and dirty way). Ultimately, CSV::XML looks the easiest, but i still dig using XML::Generator::DBI. One option i didn't try at the time, however, was DBD::AnyData. Here's one that follows the `<colN>` naming convention and uses the previous two modules ... but, caveats: commas are used instead of semi-colons - DBD::AnyData does not handle them, but could be written to do so, if desired. However, note that CSV stands for comma. ;) ughh ... i have to specify the maximum number of columns to be expected. This is not something i advocate, but then again, this is all because the `<colN>` have to be incremented. Enough, here's the code: use strict; use warnings; use DBI; use XML::Generator::DBI; use XML::Handler::YAWriter; my $max = 5; my $data = join(',',map"col$_",1..$max) . do {local $/;<DATA>}; my $dbh = DBI->connect('dbi:AnyData(RaiseError=>1):'); $dbh->func('test', 'CSV', [$data], 'ad_import'); my $generator = XML::Generator::DBI->new( Handler => XML::Handler::YAWriter->new(AsFile => '-'), dbh => $dbh, Indent => 1, ); $generator->execute('select * from test'); __DATA__ colContents,nextColContents,lastColContents colContents2,nextColContents2,lastColContents2 colContents3,nextColContents3,lastColContents3 a,b,c,d,e f,g,h,i j,k l m,n,o [download] jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l]
Re: newbie regex question: substituting repeating occurences for different replacements by eric256 (Parson) on Jul 08, 2003 at 16:11 UTC
A regex like `my $row = "hello;this;is;a;test"; my $i = 1; print "row: $row\n"; $row =~ s/;/"<\/col". $i++ ."><col$i>"/eg; $row = "<col1>" . $row . "<col$i>"; print "row: $row\n";` [download] That works. Not the prettiest thing in the world though. I have to agree thought that regex is proboly not the best solution for this problem. This one is kinda prettier :-) `my $row = "hello;this;is;a;test"; my $i = 1; print "row: $row\n"; @cols = split(/;/,$row); foreach $col (@cols) { $output .= "<col$i>$col</col$i>"; $i++} print "row: $output\n";` [download] Notice that these all deal with just a single line though easily expanded to cover multiple lines. Eric Hodges	[reply] [d/l] [select]
Re^2: newbie regex question: substituting repeating occurences for different replacements by Aristotle (Chancellor) on Jul 21, 2003 at 19:02 UTC
Or in a pinch, `while(<DATA>) { my $i; chomp; print join('', map { $i++; "<col$i>$_</col$i>" }, split /;/)."\n"; } __ hello;this;is;a;test this;is;another;simple;test` [download] Makeshifts last the longest.	[reply] [d/l]
Re: Re^2: newbie regex question: substituting repeating occurences for different replacements by eric256 (Parson) on Jul 23, 2003 at 17:56 UTC
Cool. I always have a hard time figureing out when/how to use map. The more i see it the more it makes sense though. I think its partly because i'm not used to magical vars like $_ yet. Eric Hodges	[reply]
Re^4: newbie regex question: substituting repeating occurences for different replacements by Aristotle (Chancellor) on Jul 23, 2003 at 19:05 UTC