kevyt has asked for the wisdom of the Perl Monks concerning the following question:

If I have a string that looks something like this:
$str = '703555121245874|45874 Smith St|Your Town|New Hampshire';
How can I remove and store the beginning digits without using a split and a loop? There must be a way with "tr"
I am trying to get the beginning number as an index to a hash. Like this:
$data_hash{$index} = $str;
I can do it with splits and loops but it is getting messy.

Replies are listed 'Best First'.
Re: Removing digits until you see | in a string
by quester (Vicar) on Jan 08, 2007 at 05:07 UTC
    I can't think of a way to do it with tr, but s will do it:
    $index = $str; $index =~ s/\|.*//; $data_hash{$index} = $str;
      The same, but one line shorter :-)
      ($index = $str) =~ s/\|.*//; $data_hash{$index} = $str;
      It worked. Thanks

      That solution is wrong.

      What if the string contains a newline? Or multiple newlines? Anything that follows after the newline will not be removed.

      I could suggest using s/...//s but I'm not going to do that. This code will be slower then it has to be - and it doesn't give the information you want. It just starts removing text from a | until a newline.

      My suggestion is the same as ysth's: Re: Removing digits until you see | in a string

Re: Removing digits until you see | in a string
by ysth (Canon) on Jan 08, 2007 at 05:23 UTC
    Yet another way:
    $str =~ s/^(\d+)// or die "missing digits in front of $str\n"; $data_hash{$1} = $str;
Re: Removing digits until you see | in a string
by ikegami (Patriarch) on Jan 08, 2007 at 05:13 UTC

    To get whatever's before the |:

    ($data_hash{$index}) = $str =~ /([^|]*)/;

    ( I originally posted ($data_hash{$index}) = $str =~ /(\d+)/;. )

      When I do it this way, it seems to overwrite the previous index in the hash. I store about 40 indexes with strings but I only had one to print. I am not sure what I did wrong.
        Sorry, I misread the problem. Use quester's.

      See, I would have used something like ($data_hash{$index}) = $str =~ m/^(.+?)(?=\|)/. I wondered if it was faster to stop on the pipe with [^|] or use a lookahead:

      use strict; use Benchmark; my $index; my $str = "lol238923892382938|lol282812|asdfasdf|asdfasdfasdf"; timethese(5000000, { 'stopper' => sub { ($index) = $str =~ m/([^|]+)/ }, 'lookahead' => sub { ($index) = $str =~ m/^(.+?)(?=\|)/ }, 'splitter' => sub { ($index) = split /\|/, $str }, });

      The lookahead is technically faster on my machine, but not by enough to count as a victory. I'd be curious about other's results. Sadly, the splitter wins over the regulars by a similar (i.e. smallish) amount.

      -Paul

Re: Removing digits until you see | in a string
by friedo (Prior) on Jan 08, 2007 at 05:40 UTC

    You don't have to use a loop, either, if you're still not adverse to split. Just use a parallel assignment and throw away the other pieces.

    my ( $index ) = split /\|/, $str; $data_hash{$index} = $str;

    Update: Or to remove the digits from the string, you can do this:

    my ( $index, @rest ) = split /\|/, $str; $data_hash{$index} = join '|', @rest;
      In your update, you can block the split process to the first pipe char:
      my ($index, $rest) = split /\|/, $str, 2; $data_hash{$index} = $rest;

      Flavio
      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Don't fool yourself.
Re: Removing digits until you see | in a string
by johngg (Canon) on Jan 08, 2007 at 10:05 UTC
    Still using split but no loop and no mess.

    $str = '703555121245874|45874 Smith St|Your Town|New Hampshire'; %data_hash = map { split m{\|}, $_, 2 } $str;

    If you are perhaps reading a lot of these strings from a file you could populate the hash in one fell swoop.

    my %data_hash = map { split m{\|}, $_, 2 } map {chomp; $_} <$fileHandle>;

    I hope this is of use.

    Cheers,

    JohnGG

      That is a bad idea.

      If he is reading a lot of these strings from a file then it implies that there are a lot of records in the file.

      What your code is doing is first reading the entire file into the memory and after that starting to process it.

      Also: you can combine both maps just fine.
      That is: map { chomp; split m{\|}, $_, 2 }

        What's bad about reading the file into memory? With modern computer systems it is quite a common idiom to read the whole of a file into memory before processing it. Only if the data file was very large would this become a bad idea.

        Combining the maps is good, I should have thought of it myself.

        Cheers,

        JohnGG

Re: Removing digits until you see | in a string
by Mandrake (Chaplain) on Jan 08, 2007 at 07:01 UTC
    $hash{$1} = $2 if ($str =~ /([0-9]+)(.+)/) ;
    will give you beginning digits as the index of the hash and rest of the string as the value.
    $hash{$1} = $str if ($str =~ /([0-9]+)(.+)/) ;
    beginning digits as the index and the whole string as the value of the index.
Re: Removing digits until you see | in a string
by inman (Curate) on Jan 08, 2007 at 09:31 UTC
    A couple of suggestions. The following uses a substituation and removes the index from the rest of the data (i.e. the original data is changed).
    $data_hash{$1}= $str if $str =~ s/(\d+)\|//;
    This uses matches the index and remaing data without altering the original.
    $data_hash{$1}= $2 if $str =~ /(\d+)\|(.*)$/;
    I have used a conditional to assess the validity of the assignment beforehand. This is useful for data processing where the quality of the data is variable.
Re: Removing digits until you see | in a string
by johngg (Canon) on Jan 09, 2007 at 19:19 UTC
    Yet another way, with chop, substr and index. I'm not suggesting this is a sensible way to do it but in the spirit of TIMTOWTDI

    $ perl -le ' > $str = q{703555121245874|45874 Smith St|Your Town|New Hampshire}; > chop( $index = substr( $str, 0, index( $str, q{|} ) + 1, q{} ) ); > print qq{$index\n$str};' 703555121245874 45874 Smith St|Your Town|New Hampshire $

    The index( $str, q{|} ) + 1 finds the position one past the first pipe symbol, you then replace from start of string to that point with an empty string (4th argument) and substr returns what it has just replaced, which is assigned to $index but it will still have the pipe symbol at the end so use chop to remove the last character from the LHS.

    I think frodo72's modification of friedo's update is the cleanest and easiest to understand of the solutions proposed.

    Cheers,

    JohnGG