Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have the following data as shown below,I want to remove the "#digit" at the end.Can I use chop function to remove it or is there another built-in perl module?pls advise

INPUT:- //depot/programfiles/scripts/files/data/src/script_iface4_hdlr.c#4 //depot/programfiles/scripts/files/data/src/script_pkt_mgr.c#7 //depot/programfiles/scripts/files/data/src/script_pkt_mgr.h#3 OUTPUT:- //depot/programfiles/scripts/files/data/src/script_iface4_hdlr.c //depot/programfiles/scripts/files/data/src/script_pkt_mgr.c //depot/programfiles/scripts/files/data/src/script_pkt_mgr.h

Replies are listed 'Best First'.
Re: How do remove trailing data?
by ahmad (Hermit) on Dec 24, 2010 at 01:42 UTC

    You can clean it out using regex

    s/#\d+$//;
Re: How do remove trailing data?
by TomDLux (Vicar) on Dec 24, 2010 at 03:56 UTC

    My opinion, regex is overkill for such situations. Might be the developers considered all situations, and made regex handle the situation efficiently, but it just seems wrong.

    Efficient and simple: split on the # and take the first component:

    my $url = ( split '#', $urlplusplus )[0]

    or use index() to find where the # is, and use substr to fetch the prefix.

    my $url = substr $urlplusplus, 0, index( $urlplusplus, '#' );

    Of course, benchmark might provide some surprises.

    As Occam said: Entia non sunt multiplicanda praeter necessitatem.

      The split option fails if there's more than one '#' in the string - or if the '#' isn't followed by digits. Note that the regexp engine is optimized for patterns like /#[0-9]+$/:
      perl -Mre=debug -e '$_ = "foobar#2"; s/#[0-9]+$//' Compiling REx "#[0-9]+$" Final program: 1: EXACT <#> (3) 3: PLUS (15) 4: ANYOF[0-9][] (0) 15: EOL (16) 16: END (0) anchored "#" at 0 floating ""$ at 2..2147483647 (checking anchored) mi +nlen 2 Guessing start of match in sv for REx "#[0-9]+$" against "foobar#2" Found anchored substr "#" at offset 6... Found floating substr ""$ at offset 8... Starting position does not contradict /^/m... Guessed: match at offset 6 Matching REx "#[0-9]+$" against "#2" 6 <foobar> <#2> | 1:EXACT <#>(3) 7 <foobar#> <2> | 3:PLUS(15) ANYOF[0-9][] can match 1 times out o +f 2147483647... 8 <foobar#2> <> | 15: EOL(16) 8 <foobar#2> <> | 16: END(0) Match successful! Freeing REx: "#[0-9]+$"
      So, I'd go with the s/// solution.
      Preferring split to a regex makes little sense to me (obviously, YMDV) unless benchmarking supports that choice (and in OP's case, any difference in execution time would presumably be un-noticeable).

      Using split still requires the basic familiarity with a simple regex such as ahmad offered; is longer; and may require more than 'just a glance' to understand at some future date.
          ...but I agree that index, despite it's length and comparative complexity, may be a valuable alternative in the case stated by OP.

Re: How do remove trailing data?
by Anonymous Monk on Dec 24, 2010 at 01:48 UTC
    $ perl -le" $_ = shift; print substr $_,0, rindex $_, q!#!" before#aft +er#last before#after $ perl -le" $_ = shift; print substr $_,0, index $_, q!#!" before#afte +r#last before
    In honor of some guy with lots of experience
      The logic in the for loop is the same,is there a way I can do a better implementation,is there a way I can combine the forloops into one?pls advise
      if((@changed_files || @newlyinsertedfiles || @branchedfiles)) { foreach my $file (@changed_files) { $file =~ s/#\d+$//; #print "FILE:$file\n"; push @changed_paths,"$file\n"; } foreach my $file (@newlyinsertedfiles) { $file =~ s/#\d+$//; #print "FILE:$file\n"; push @newlyinserted_paths,"$file\n"; } foreach my $file (@branchedfiles) { $file =~ s/#\d+$//; #print "FILE:$file\n"; push @branched_paths,"$file\n"; } }

        If you can overwrite the existing array values you can use following for the complete if block. You need not write if as the loop will run only if arrays are not empty

        s/#\d+$// for (@changed_files, @newlyinsertedfiles, @branchedfiles);
        --
        Regards
        - Samar