Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

using an RE to consolidate repeated words

by jaa (Friar)
on Aug 16, 2005 at 11:11 UTC ( [id://484119]=perlquestion: print w/replies, xml ) Need Help??

jaa has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am trying to use an RE to consolidate repeated words

A simplified example of my before / after;

my $before = "apples/apples_green"; my $after = $before; my $desired = "apples_green"; # I tried this, but no go! $after =~ s#^(\w+)/$1_#$1_#; print "before [$before] after [$after] desired [$desired]\n";
Any suggestions much appreciated!

Regards,

Jeff

Replies are listed 'Best First'.
Re: using an RE to consolidate repeated words
by GrandFather (Saint) on Aug 16, 2005 at 11:19 UTC

    s#^(\w+)/$1_#$1_# should be s#^(\w+)/\1_#$1_#. $1 does not work in the match string, only in the replace string. You have to use \1 in the match string.


    Perl is Huffman encoded by design.
      Fab - thanks and kudos, just the pointer I was looking for!

      Regards,

      Jeff

Re: using an RE to consolidate repeated words
by inman (Curate) on Aug 16, 2005 at 11:23 UTC
    You need to use a back reference.
    $after =~ s#^(\w+)/\1#$1#;

    Tidying up the syntax and changing the test for the bit in between the repeats gives:

    $after =~ s/(\w+)\W+\1/$1/;
    I assumed that there is no need to anchor the pattern to the start of the string so I removed the ^
Re: using an RE to consolidate repeated words
by GrandFather (Saint) on Aug 16, 2005 at 11:16 UTC

    Can you provide a more realistic set of data. The single example you have given looks rather artificial. Alternatively, what generates data that looks like that?


    Perl is Huffman encoded by design.

      The data is being generated as part of a backup process, whose naming is outside my control. I have to collate stats on the various backup folders, by a derived group name.

      If I were to hand code it, I would do something like:

      use File::Basename qw( dirname basename ); for my $folder ( '/var/vavoom/cherry/cherry_etc', # cherry_etc '/var/varoom/cherry/cherry_var_data', # cherry_var_data '/var/vavoom/peach/peach_etc', # peach_etc '/var/varoom/mysql/peach_mysql_chant', # mysql_peach_chant '/var/vavoom/upload/var_finite', # upload_var_finite '/var/vavoom/upload/var_open', # upload_var_open ) { my $group = basename($folder); my $parent = basename(dirname($folder)); $group =~ s/$parent\_//g; $group = $parent . '_' . $group; print sprintf("%-40s %s\n", $folder, $group ); } /var/vavoom/cherry/cherry_etc cherry_etc /var/varoom/cherry/cherry_var_data cherry_var_data /var/vavoom/peach/peach_etc peach_etc /var/varoom/mysql/peach_mysql_chant mysql_peach_chant /var/vavoom/upload/var_finite upload_var_finite /var/vavoom/upload/var_open upload_var_open

      I was hoping for pointers to an RE technique that would enable me to consolidate repeating words.

      Regards,

      Jeff

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://484119]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-25 15:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found