extra spaces between the characters in string

harshashende has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: extra spaces between the characters in string by davido (Cardinal) on Jun 08, 2011 at 07:27 UTC
You said you want to remove the first word, but in your example of input and desired output it seems you're removing the leading numbers. Is that what you really meant to say? I don't really think that substr is the right tool for the job... at least not when Perl provides such powerful regexp tools. `use strict; use warnings; s/^\d+// && print while <>;` [download] Invoke it like this: `myscript intext.txt > outfile.txt` [download] Or as a Perl one-liner: `perl -p -e 's/^\d+//' infile.txt > outfile.txt` [download] Of course it's always advisable to skip the redirection on the first run through the script so that you can see on screen if your output is going to be what you want. Dave	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: extra spaces between the characters in string by moritz (Cardinal) on Jun 08, 2011 at 07:30 UTC
What's wrong with the answers in Perl Regex with Extra Spaces? And what code are you running that produces the wrong output? It could also be that the program's output is encoded in UTF-16, and the "spaces" you see are actually null-bytes. In this case Encode::decode can help. See also: Character encodings and Unicode in Perl. Perl 6 - second systems done right	[reply]
Re: extra spaces between the characters in string by ikegami (Patriarch) on Jun 08, 2011 at 07:42 UTC
Sounds like UTF-16 to me. Use `open(my $fh, '<:encoding(UTF-16le)', $qfn)` [download] to open your file.	[reply] [d/l]
Re^2: extra spaces between the characters in string by harshashende (Initiate) on Jun 08, 2011 at 08:45 UTC
Thanks Moritz and ikegami. I dont know about UTF-16 encoding , will have a look into this.	[reply]
Re: extra spaces between the characters in string by locked_user sundialsvc4 (Abbot) on Jun 08, 2011 at 12:46 UTC
Believe it or not, even in a problem that looks like it could have a solution using `substr()`, regular expressions are usually a superior solution. So much time has been poured into that code, both in terms of ability and of overall efficiency, that it wins race after race after race. (I know, it’s counter-intuitive. It had to be proven to me, too.)
Re^2: extra spaces between the characters in string by BrowserUk (Patriarch) on Jun 08, 2011 at 13:05 UTC
Believe it or not, Not. Show us one example where a regex comes even close to matching much less beating substr? `$s = '1234567890'x10;; cmpthese -1, { a=>q[ my $x = substr $s, $_, 10 for 0 .. 99], b=>q[ my($x) = m[^.{$_}(.{10})] for 0 .. 99] };; Rate b a b 1113/s -- -97% a 42708/s 3738% --` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^3: extra spaces between the characters in string by SuicideJunkie (Vicar) on Jun 08, 2011 at 16:15 UTC
Problems that could be done with substr, but are not trivial as in your example. Also, "superior" ne "purely faster". If it requires loops and ifs around the substr, it is surely faster to write and debug. Execution would depend on how much and how fancy you need to be. The OP problem, for example, is nearly trivial with a regex, but you'd have to pay me money to write as a set of substr()s. "Usually" is certainly debatable, and I expect that it strongly depends on your environment. Personally, my code has a ratio of maybe 1:99 substr:regex. Filtering to what could plausibly be done with a set of substr, I'd guess it could be brought up to about 50/50, but the code would be horrendously brittle and scary. And I'm pretty sure my implementation of the search and matching would not be as fast as the regex engine. If you are always dealing with fixed width field data, substr becomes more useful and common.	[reply]