generate character string based on byte count !!

barathbr has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: generate character string based on byte count !! by Joost (Canon) on Dec 08, 2004 at 08:59 UTC
I don't know what you mean. "Characters" don't have a length. The actual number of bytes taken by a character in a string is dependent on the coded character set (unicode, latin-1, ascii...) and encoding (for unicode, these include utf-8, utf-16, ucs-2 and ucs-4) Under utf-8, the first 127 characters take up 1 byte, and higer numbered characters take a variable number of bytes (I'm not sure about the exact encoding, but IIRC it can take up to 4 bytes under the current unicode set). Under ascii and latin-1 all characters are encoded using 1 byte (8 bits). Under ucs-2 all characters take 2 bytes, and under ucs-4 all characters take 4 bytes. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
Re^2: generate character string based on byte count !! by hv (Prior) on Dec 08, 2004 at 11:01 UTC
More info: UTF8 ASCII as implemented in perl requires a second byte for codepoints 0x80 and higher, a third byte at 0x800, a fourth at 0x10000, a fifth at 0x200000, a sixth at 0x4000000 and a seventh at 0x80000000. Note that this extends beyond the defined Unicode range, since we may store things other than Unicode characters in our strings - perl supports any integer that fits in a UV (32-bit or 64-bit unsigned integer, depending on your perl build) as a codepoint. If I understand the code correctly (Perl_uvuni_to_utf8_flags() in utf8.c), higher codepoints (available only where perl is compiled with 64-bit integer support) use 7 bytes up to 0x1000000000, and a fixed 13 bytes for the rest. Hugo	[reply]
Re^2: generate character string based on byte count !! by barathbr (Scribe) on Dec 09, 2004 at 10:27 UTC
I guess I was not very clear with what I had written. In simple terms, what you are saying is correct and matter of fact thats exactly what I want. Lets say I want to generate some random japanese characters which are of 2 bytes. Pl. note that I still don't know whether you can encode a japanese character in utf8 or utf16 or whatever the character set maybe. Bottom line is, I dont really care about what language the characters get generated in. I shouldn't have used the term 'length'. What I meant was I want to generate a character string composed of characters of 2 bytes each, 4 bytes each etc. Hope that clarifies things a bit. BrowserUK, tall_man thanks for the response, but it doesn't quite solve my purpose. I hope this post adds a little more clarity to what I seek Thanks everyone	[reply]
Re: generate character string based on byte count !! by BrowserUk (Patriarch) on Dec 08, 2004 at 12:37 UTC
Are you looking for something like this? #! perl -slw use strict; ## Adjust to suit your requirements my %types = ( lower => [ 'a'..'z' ], upper => [ 'A'..'Z' ], number=> [ '0'..'9' ], char => [ 'a'..'z', 'A'..'Z', '0'..'9' ], ); print join ' ', map{ my( $type, $n ) = $_ =~ m[(\w+) length = (\d+)]; join'', map{ $types{ $type }[ rand @{ $types{ $type } } ] } 1 .. $n; } @ARGV; __END__ P:\test>413131 "char length = 4" "number length = 3" "lower length = 6 +" tAwY 828 xkppno [12:35:08.56] P:\test>413131 "char length = 1" "number length = 10" "u +pper length = 2" j 3625181636 OB [download] Examine what is said, not who speaks. "But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen "Think for yourself!" - Abigail "Time is a poor substitute for thought"--theorbtwo "Efficiency is intelligent laziness." -David Dunham "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon	[reply] [d/l]
Re: generate character string based on byte count !! by Anonymous Monk on Dec 08, 2004 at 11:21 UTC
You get the byte length of characters when you use the bytes pragma. How about this (untested): `print "char#\tlength1\tlength2\n"; for (32..500) { print; print "\t"; print length chr; print "\t"; { use bytes; print length chr; no bytes; }; print "\n"; };` [download]	[reply] [d/l]
Re: generate character string based on byte count !! by tall_man (Parson) on Dec 08, 2004 at 16:40 UTC
Are we making this too complicated? If all you want is to create a character string of a given length, perhaps what you need is just the "x" command applied to a simple one-byte ascii character. `#!/usr/bin/perl -w use strict; my $len = 0; if (@ARGV >= 1) { $len = $ARGV[0]; } $len > 0 \|\| die "Usage: genlen.pl number\n"; print "a" x $len;` [download]	[reply] [d/l]