Removing digits until you see

kevyt has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Removing digits until you see \| in a string by quester (Vicar) on Jan 08, 2007 at 05:07 UTC
I can't think of a way to do it with tr, but s will do it: `$index = $str; $index =~ s/\\|.*//; $data_hash{$index} = $str;` [download]	[reply] [d/l]
Re^2: Removing digits until you see \| in a string by MaxKlokan (Monk) on Jan 08, 2007 at 09:29 UTC
The same, but one line shorter :-) `($index = $str) =~ s/\\|.*//; $data_hash{$index} = $str;` [download]	[reply] [d/l]
Re^2: Removing digits until you see \| in a string by kevyt (Scribe) on Jan 08, 2007 at 05:18 UTC
It worked. Thanks	[reply]
Re^2: Removing digits until you see \| in a string by Animator (Hermit) on Jan 08, 2007 at 12:04 UTC
That solution is wrong. What if the string contains a newline? Or multiple newlines? Anything that follows after the newline will not be removed. I could suggest using s/...//s but I'm not going to do that. This code will be slower then it has to be - and it doesn't give the information you want. It just starts removing text from a \| until a newline. My suggestion is the same as ysth's: Re: Removing digits until you see \| in a string	[reply]
Re: Removing digits until you see \| in a string by ysth (Canon) on Jan 08, 2007 at 05:23 UTC
Yet another way: `$str =~ s/^(\d+)// or die "missing digits in front of $str\n"; $data_hash{$1} = $str;` [download]	[reply] [d/l]
Re: Removing digits until you see \| in a string by ikegami (Patriarch) on Jan 08, 2007 at 05:13 UTC
To get whatever's before the `\|`: `($data_hash{$index}) = $str =~ /([^\|]*)/;` [download] ( I originally posted `($data_hash{$index}) = $str =~ /(\d+)/;`. )	[reply] [d/l] [select]
Re^2: Removing digits until you see \| in a string by kevyt (Scribe) on Jan 08, 2007 at 05:29 UTC
When I do it this way, it seems to overwrite the previous index in the hash. I store about 40 indexes with strings but I only had one to print. I am not sure what I did wrong.	[reply]
Re^3: Removing digits until you see \| in a string by ikegami (Patriarch) on Jan 08, 2007 at 16:04 UTC
Sorry, I misread the problem. Use quester's.	[reply]
Re^2: Removing digits until you see \| in a string by jettero (Monsignor) on Jan 08, 2007 at 14:23 UTC
See, I would have used something like `($data_hash{$index}) = $str =~ m/^(.+?)(?=\\|)/`. I wondered if it was faster to stop on the pipe with `[^\|]` or use a lookahead: `use strict; use Benchmark; my $index; my $str = "lol238923892382938\|lol282812\|asdfasdf\|asdfasdfasdf"; timethese(5000000, { 'stopper' => sub { ($index) = $str =~ m/([^\|]+)/ }, 'lookahead' => sub { ($index) = $str =~ m/^(.+?)(?=\\|)/ }, 'splitter' => sub { ($index) = split /\\|/, $str }, });` [download] The lookahead is technically faster on my machine, but not by enough to count as a victory. I'd be curious about other's results. Sadly, the splitter wins over the regulars by a similar (i.e. smallish) amount. -Paul	[reply] [d/l] [select]
Re: Removing digits until you see \| in a string by friedo (Prior) on Jan 08, 2007 at 05:40 UTC
You don't have to use a loop, either, if you're still not adverse to `split`. Just use a parallel assignment and throw away the other pieces. `my ( $index ) = split /\\|/, $str; $data_hash{$index} = $str;` [download] Update: Or to remove the digits from the string, you can do this: `my ( $index, @rest ) = split /\\|/, $str; $data_hash{$index} = join '\|', @rest;` [download]	[reply] [d/l] [select]
Re^2: Removing digits until you see \| in a string by polettix (Vicar) on Jan 08, 2007 at 15:00 UTC
In your update, you can block the split process to the first pipe char: `my ($index, $rest) = split /\\|/, $str, 2; $data_hash{$index} = $rest;` [download] Flavio perl -ple'$_=reverse' <<<ti.xittelop@oivalf Don't fool yourself.	[reply] [d/l]
Re: Removing digits until you see \| in a string by johngg (Canon) on Jan 08, 2007 at 10:05 UTC
Still using `split` but no loop and no mess. `$str = '703555121245874\|45874 Smith St\|Your Town\|New Hampshire'; %data_hash = map { split m{\\|}, $_, 2 } $str;` [download] If you are perhaps reading a lot of these strings from a file you could populate the hash in one fell swoop. `my %data_hash = map { split m{\\|}, $_, 2 } map {chomp; $_} <$fileHandle>;` [download] I hope this is of use. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: Removing digits until you see \| in a string by Animator (Hermit) on Jan 08, 2007 at 12:10 UTC
That is a bad idea. If he is reading a lot of these strings from a file then it implies that there are a lot of records in the file. What your code is doing is first reading the entire file into the memory and after that starting to process it. Also: you can combine both maps just fine. That is: `map { chomp; split m{\\|}, $_, 2 }`	[reply] [d/l]
Re^3: Removing digits until you see \| in a string by johngg (Canon) on Jan 08, 2007 at 13:53 UTC
What's bad about reading the file into memory? With modern computer systems it is quite a common idiom to read the whole of a file into memory before processing it. Only if the data file was very large would this become a bad idea. Combining the `map`s is good, I should have thought of it myself. Cheers, JohnGG	[reply] [d/l]
Re^4: Removing digits until you see \| in a string by Animator (Hermit) on Jan 08, 2007 at 14:11 UTC
Re^5: Removing digits until you see \| in a string by johngg (Canon) on Jan 08, 2007 at 16:55 UTC
Some notes below your chosen depth have not been shown here
Re: Removing digits until you see \| in a string by Mandrake (Chaplain) on Jan 08, 2007 at 07:01 UTC
`$hash{$1} = $2 if ($str =~ /([0-9]+)(.+)/) ;` [download] will give you beginning digits as the index of the hash and rest of the string as the value. `$hash{$1} = $str if ($str =~ /([0-9]+)(.+)/) ;` [download] beginning digits as the index and the whole string as the value of the index.	[reply] [d/l] [select]
Re: Removing digits until you see \| in a string by inman (Curate) on Jan 08, 2007 at 09:31 UTC
A couple of suggestions. The following uses a substituation and removes the index from the rest of the data (i.e. the original data is changed). `$data_hash{$1}= $str if $str =~ s/(\d+)\\|//;` [download] This uses matches the index and remaing data without altering the original. `$data_hash{$1}= $2 if $str =~ /(\d+)\\|(.*)$/;` [download] I have used a conditional to assess the validity of the assignment beforehand. This is useful for data processing where the quality of the data is variable.	[reply] [d/l] [select]
Re: Removing digits until you see \| in a string by johngg (Canon) on Jan 09, 2007 at 19:19 UTC
Yet another way, with `chop`, `substr` and `index`. I'm not suggesting this is a sensible way to do it but in the spirit of TIMTOWTDI `$ perl -le ' > $str = q{703555121245874\|45874 Smith St\|Your Town\|New Hampshire}; > chop( $index = substr( $str, 0, index( $str, q{\|} ) + 1, q{} ) ); > print qq{$index\n$str};' 703555121245874 45874 Smith St\|Your Town\|New Hampshire $` [download] The `index( $str, q{\|} ) + 1` finds the position one past the first pipe symbol, you then replace from start of string to that point with an empty string (4th argument) and `substr` returns what it has just replaced, which is assigned to `$index` but it will still have the pipe symbol at the end so use `chop` to remove the last character from the LHS. I think frodo72's modification of friedo's update is the cleanest and easiest to understand of the solutions proposed. Cheers, JohnGG	[reply] [d/l] [select]

Replies are listed 'Best First'.
Re: Removing digits until you see \| in a string by quester (Vicar) on Jan 08, 2007 at 05:07 UTC
I can't think of a way to do it with tr, but s will do it: `$index = $str; $index =~ s/\\|.*//; $data_hash{$index} = $str;` [download]	[reply] [d/l]
Re^2: Removing digits until you see \| in a string by MaxKlokan (Monk) on Jan 08, 2007 at 09:29 UTC
The same, but one line shorter :-) `($index = $str) =~ s/\\|.*//; $data_hash{$index} = $str;` [download]	[reply] [d/l]
Re^2: Removing digits until you see \| in a string by kevyt (Scribe) on Jan 08, 2007 at 05:18 UTC
It worked. Thanks	[reply]
Re^2: Removing digits until you see \| in a string by Animator (Hermit) on Jan 08, 2007 at 12:04 UTC
That solution is wrong. What if the string contains a newline? Or multiple newlines? Anything that follows after the newline will not be removed. I could suggest using s/...//s but I'm not going to do that. This code will be slower then it has to be - and it doesn't give the information you want. It just starts removing text from a \| until a newline. My suggestion is the same as ysth's: Re: Removing digits until you see \| in a string	[reply]
Re: Removing digits until you see \| in a string by ysth (Canon) on Jan 08, 2007 at 05:23 UTC
Yet another way: `$str =~ s/^(\d+)// or die "missing digits in front of $str\n"; $data_hash{$1} = $str;` [download]	[reply] [d/l]
Re: Removing digits until you see \| in a string by ikegami (Patriarch) on Jan 08, 2007 at 05:13 UTC
To get whatever's before the `\|`: `($data_hash{$index}) = $str =~ /([^\|]*)/;` [download] ( I originally posted `($data_hash{$index}) = $str =~ /(\d+)/;`. )	[reply] [d/l] [select]
Re^2: Removing digits until you see \| in a string by kevyt (Scribe) on Jan 08, 2007 at 05:29 UTC
When I do it this way, it seems to overwrite the previous index in the hash. I store about 40 indexes with strings but I only had one to print. I am not sure what I did wrong.	[reply]
Re^3: Removing digits until you see \| in a string by ikegami (Patriarch) on Jan 08, 2007 at 16:04 UTC
Sorry, I misread the problem. Use quester's.	[reply]
Re^2: Removing digits until you see \| in a string by jettero (Monsignor) on Jan 08, 2007 at 14:23 UTC
See, I would have used something like `($data_hash{$index}) = $str =~ m/^(.+?)(?=\\|)/`. I wondered if it was faster to stop on the pipe with `[^\|]` or use a lookahead: `use strict; use Benchmark; my $index; my $str = "lol238923892382938\|lol282812\|asdfasdf\|asdfasdfasdf"; timethese(5000000, { 'stopper' => sub { ($index) = $str =~ m/([^\|]+)/ }, 'lookahead' => sub { ($index) = $str =~ m/^(.+?)(?=\\|)/ }, 'splitter' => sub { ($index) = split /\\|/, $str }, });` [download] The lookahead is technically faster on my machine, but not by enough to count as a victory. I'd be curious about other's results. Sadly, the splitter wins over the regulars by a similar (i.e. smallish) amount. -Paul	[reply] [d/l] [select]
Re: Removing digits until you see \| in a string by friedo (Prior) on Jan 08, 2007 at 05:40 UTC
You don't have to use a loop, either, if you're still not adverse to `split`. Just use a parallel assignment and throw away the other pieces. `my ( $index ) = split /\\|/, $str; $data_hash{$index} = $str;` [download] Update: Or to remove the digits from the string, you can do this: `my ( $index, @rest ) = split /\\|/, $str; $data_hash{$index} = join '\|', @rest;` [download]	[reply] [d/l] [select]
Re^2: Removing digits until you see \| in a string by polettix (Vicar) on Jan 08, 2007 at 15:00 UTC
In your update, you can block the split process to the first pipe char: `my ($index, $rest) = split /\\|/, $str, 2; $data_hash{$index} = $rest;` [download] Flavio perl -ple'$_=reverse' <<<ti.xittelop@oivalf Don't fool yourself.	[reply] [d/l]
Re: Removing digits until you see \| in a string by johngg (Canon) on Jan 08, 2007 at 10:05 UTC
Still using `split` but no loop and no mess. `$str = '703555121245874\|45874 Smith St\|Your Town\|New Hampshire'; %data_hash = map { split m{\\|}, $_, 2 } $str;` [download] If you are perhaps reading a lot of these strings from a file you could populate the hash in one fell swoop. `my %data_hash = map { split m{\\|}, $_, 2 } map {chomp; $_} <$fileHandle>;` [download] I hope this is of use. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: Removing digits until you see \| in a string by Animator (Hermit) on Jan 08, 2007 at 12:10 UTC
That is a bad idea. If he is reading a lot of these strings from a file then it implies that there are a lot of records in the file. What your code is doing is first reading the entire file into the memory and after that starting to process it. Also: you can combine both maps just fine. That is: `map { chomp; split m{\\|}, $_, 2 }`	[reply] [d/l]
Re^3: Removing digits until you see \| in a string by johngg (Canon) on Jan 08, 2007 at 13:53 UTC
What's bad about reading the file into memory? With modern computer systems it is quite a common idiom to read the whole of a file into memory before processing it. Only if the data file was very large would this become a bad idea. Combining the `map`s is good, I should have thought of it myself. Cheers, JohnGG	[reply] [d/l]
Re^4: Removing digits until you see \| in a string by Animator (Hermit) on Jan 08, 2007 at 14:11 UTC
Re^5: Removing digits until you see \| in a string by johngg (Canon) on Jan 08, 2007 at 16:55 UTC
Some notes below your chosen depth have not been shown here
Re: Removing digits until you see \| in a string by Mandrake (Chaplain) on Jan 08, 2007 at 07:01 UTC
`$hash{$1} = $2 if ($str =~ /([0-9]+)(.+)/) ;` [download] will give you beginning digits as the index of the hash and rest of the string as the value. `$hash{$1} = $str if ($str =~ /([0-9]+)(.+)/) ;` [download] beginning digits as the index and the whole string as the value of the index.	[reply] [d/l] [select]
Re: Removing digits until you see \| in a string by inman (Curate) on Jan 08, 2007 at 09:31 UTC
A couple of suggestions. The following uses a substituation and removes the index from the rest of the data (i.e. the original data is changed). `$data_hash{$1}= $str if $str =~ s/(\d+)\\|//;` [download] This uses matches the index and remaing data without altering the original. `$data_hash{$1}= $2 if $str =~ /(\d+)\\|(.*)$/;` [download] I have used a conditional to assess the validity of the assignment beforehand. This is useful for data processing where the quality of the data is variable.	[reply] [d/l] [select]
Re: Removing digits until you see \| in a string by johngg (Canon) on Jan 09, 2007 at 19:19 UTC
Yet another way, with `chop`, `substr` and `index`. I'm not suggesting this is a sensible way to do it but in the spirit of TIMTOWTDI `$ perl -le ' > $str = q{703555121245874\|45874 Smith St\|Your Town\|New Hampshire}; > chop( $index = substr( $str, 0, index( $str, q{\|} ) + 1, q{} ) ); > print qq{$index\n$str};' 703555121245874 45874 Smith St\|Your Town\|New Hampshire $` [download] The `index( $str, q{\|} ) + 1` finds the position one past the first pipe symbol, you then replace from start of string to that point with an empty string (4th argument) and `substr` returns what it has just replaced, which is assigned to `$index` but it will still have the pipe symbol at the end so use `chop` to remove the last character from the LHS. I think frodo72's modification of friedo's update is the cleanest and easiest to understand of the solutions proposed. Cheers, JohnGG	[reply] [d/l] [select]