Here's csved, an old script of mine that I use to quickly munge CSV files.
Essentially, it works similar to perl -a or awk. After reading each line, it is parsed into fields and put into @F. Then it executes your expression, and after that, unless -n is given, it prints @F again in CSV format. You can mangle @F any way you like, changing, reordering or deleting entries or completely reassigning it. You can also use next to skip printing a particular line.
I recently thought it would be useful to be able to address columns by their name if the CSV file has those in its first line, so I added the -h option which, via an egregious abuse of tying that I'm very proud of, allows you to also address the fields using the hash %F and field names taken from the first line of the CSV file. Any access through %F actually affects @F directly except via a fieldname-to-index mapping. You can delete entries by assigning undef to them, or delete $F{fieldname} also works: it doesn't delete anything from the tied hash, but instead immediately deletes it from the underlying array @F representing the current line/row.
Also I added -b and -e options which work similarly to BEGIN and END, but I haven't really used them yet. Options to read and write CSV with different separator characters are also available.
This also called for a (still incomplete) test suite; you can look at the results here: https://github.com/gpvos/csved
I hope you will find it useful.
#!/usr/bin/perl -w # # csved - apply a Perl expression to all records of a CSV file # # Copyright (c) 2003-2022 Gerben Vos. All rights reserved. # This program is free software; you can redistribute it and/or # modify it under the same terms as Perl 5.8.0. # # Run without arguments for usage information. # # Requires the Text::CSV_XS module by Jochen Wiedmann and Alan Citterm +an, # available from http://www.cpan.org/ . use v5.10; use strict 'vars', 'refs'; # This is a weird usage of tying because the hash is tied only once, # but the underlying data changes for every processed line. package Tie::FieldNames; sub TIEHASH { my $self = shift; my $i = 0; my $tied = { HEADERS => { map { ($_, $i++) } @_ }, }; die "Duplicate field names: " . join(", ", map { "'$_'" } _dups(@_)) unless scalar keys %{$tied->{HEADERS}} == scalar @_; return bless $tied, $self; } sub _dups { my %count; ++$count{$_} for @_; return grep { $count{$_} > 1 } keys %count; } sub setcurrent { $_[0]->{CURRENT} = $_[1]; } sub FETCH { my ($self, $key) = @_; die "$key: No such field name" unless exists $self->{HEADERS}->{$k +ey}; return $self->{CURRENT}->[$self->{HEADERS}->{$key}]; } sub STORE { my ($self, $key, $value) = @_; die "$key: No such field name" unless exists $self->{HEADERS}->{$k +ey}; $self->{CURRENT}->[$self->{HEADERS}->{$key}] = $value; } sub DELETE { my ($self, $key) = @_; die "$key: No such field name" unless exists $self->{HEADERS}->{$k +ey}; delete $self->{CURRENT}->[$self->{HEADERS}->{$key}]; } sub EXISTS { my ($self, $key) = @_; die "$key: No such field name" unless exists $self->{HEADERS}->{$k +ey}; return exists $self->{CURRENT}->[$self->{HEADERS}->{$key}]; } sub FIRSTKEY { my ($self) = @_; my $dummy = keys %{$self->{HEADERS}}; each %{$self->{HEADERS}}; } sub NEXTKEY { return each %{$_[0]->{HEADERS}}; } sub SCALAR { return scalar grep { defined } @{$_[0]->{CURRENT}}; } package main; use Text::CSV_XS; use IO::Handle; my $_silent = 0; my $_use_headers = 0; my $_begin = ""; my $_end = ""; my $_sep = ','; my $_quote = '"'; my $_escape = '"'; my $_progname; ($_progname = $0) =~ s#.*/##; my $_usage = <<USAGE; Usage: $_progname [option...] expr [file...] Apply a Perl expression to all records of a CSV file. -n suppress printing -h treat first line as field names and allow access through \%F -b expr execute expr before processing any input -e expr execute expr after processing all input -F char set the character that separates fields (default: comma) -Q char set the character for quoting fields that contain a separat +or character or end-of-line (default: double quote) -E char set the character for escaping a quote character (default: +double quote) The input is processed line by line. expr should be a Perl expression modifying the \@F array; indexes are +0-based. With -h, you can also modify the \%F hash with the same effect. Deleti +ng an array or hash entry, or setting it to undef, will cause it not to be o +utput. The value of \@F after executing expr is sent to the standard output. With -h, field names are read from the first line just before executin +g expr on it, so if e.g. field 8 is named "comment", you can use "delete \$F{ +comment}" and that would be equivalent to "delete \$F[8]". Duplicate field names + in the input or a field name in expr that does not exist in the input will ca +use a fatal error. A field name for \%F refers to a fixed index in \@F, so i +f you rearrange the data in \@F, accesses through \%F may refer to incorrect + fields. Works with binary files (for example, with embedded newlines). You can use next (to suppress printing), last and redo in expr. \$. works in expr, so you can e.g. use "if (\$. > 1) { ... }" to not a +pply some code to the first line. Variable names starting with _ are reserved, don't use them in expr. Uses the Text::CSV_XS module by Jochen Wiedmann and Alan Citterman, wi +th thanks. USAGE while (@ARGV > 0 and $ARGV[0] =~ m{^-}) { my $opt = shift; $_silent = 1, next if $opt eq '-n'; $_use_headers = 1, next if $opt eq '-h'; $_begin = shift, next if $opt eq '-b'; $_end = shift, next if $opt eq '-e'; $_sep = shift, next if $opt eq '-F'; $_quote = shift, next if $opt eq '-Q'; $_escape = shift, next if $opt eq '-E'; die $_usage; } die $_usage unless @ARGV > 0; my $_expr = shift; my $_csv = Text::CSV_XS->new({ sep_char => $_sep, quote_char => $_quot +e, escape_char => $_escape, binary => 1, eol => "\n" }); unshift(@ARGV, '-') unless @ARGV; { no strict; eval $_begin; die $@ if $@; } my %F; my $_tied; while ($ARGV = shift) { open(_IN, $ARGV); my $_fields; while ($_fields = $_csv->getline(\*main::_IN) and @$_fields) { my(@F) = @$_fields; if ($_use_headers) { if ($. == 1) { $_tied = tie %F, 'Tie::FieldNames', @F; } $_tied->setcurrent(\@F); } # This means you can use next, last, and redo in expr # without excessive noise. Also turn off strictness. no warnings "exiting"; no strict; eval $_expr; die $@ if $@; $_csv->print(STDOUT, [ grep { defined } @F ]) unless $_silent; } } if ($_use_headers) { undef $_tied; untie %F; } { no strict; eval $_end; die $@ if $@; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Command-line utility to apply perl expressions to CSV files
by Tux (Canon) on Nov 21, 2022 at 16:10 UTC | |
by gpvos (Acolyte) on Nov 21, 2022 at 21:10 UTC | |
by Fletch (Bishop) on Nov 22, 2022 at 15:36 UTC |