in reply to how to split a file.txt in multiple text files

You can use $/ :)

$ ls -l zzz -rw-rw-rw- 1 tux users 60892 Feb 12 15:54 zzz $ perl -CS -Mautodie -wE'$/=\3000;my$i="0000";while(<>){open my $fh, " +>:encoding(utf-8)", "zz".$i++;print $fh $_}' < zzz $ ls -l zz0* -rw-rw-rw- 1 tux users 3624 Feb 12 15:58 zz0000 -rw-rw-rw- 1 tux users 3681 Feb 12 15:58 zz0001 -rw-rw-rw- 1 tux users 3661 Feb 12 15:58 zz0002 -rw-rw-rw- 1 tux users 3655 Feb 12 15:58 zz0003 -rw-rw-rw- 1 tux users 3652 Feb 12 15:58 zz0004 -rw-rw-rw- 1 tux users 3634 Feb 12 15:58 zz0005 -rw-rw-rw- 1 tux users 3640 Feb 12 15:58 zz0006 -rw-rw-rw- 1 tux users 3646 Feb 12 15:58 zz0007 -rw-rw-rw- 1 tux users 3631 Feb 12 15:58 zz0008 -rw-rw-rw- 1 tux users 3631 Feb 12 15:58 zz0009 -rw-rw-rw- 1 tux users 3692 Feb 12 15:58 zz0010 -rw-rw-rw- 1 tux users 3659 Feb 12 15:58 zz0011 -rw-rw-rw- 1 tux users 3647 Feb 12 15:58 zz0012 -rw-rw-rw- 1 tux users 3648 Feb 12 15:58 zz0013 -rw-rw-rw- 1 tux users 3634 Feb 12 15:58 zz0014 -rw-rw-rw- 1 tux users 3643 Feb 12 15:58 zz0015 -rw-rw-rw- 1 tux users 2514 Feb 12 15:58 zz0016

Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^2: how to split a file.txt in multiple text files
by saulnier (Initiate) on Feb 14, 2019 at 14:51 UTC
    Thank you tux. Your script works well but I also obtain a series of warnings such as:

    utf8 "\xCE" does not map to Unicode at split2.pl line 9, <> chunk 3.

    utf8 "\x94" does not map to Unicode at split2.pl line 9, <> chunk 4.

    Wide character in print at split2.pl line 9, <> chunk 2.

    ...

    and above all many of the files created are filled with unintelligible characters instead of having fragments of my greek text. Any idea?

      • What is your OS?
      • What is your perl version? (perl -v)
      • Did you invoke the script with the required -CS command-line option?
        $ perl -CS split2.pl < inputfile

      My example was used on UTF-8 encoded files that contained quite a few characters outside of the iso-8895-1 range, so I should have noted the same warnings if my example was seriously flawed.

      Is your data secret, or is it sharable, in which case, some of us might want to download it (in a zip) to check.

      As you converted my command-line example to a script, maybe it would be a goor idea to show what the script looks like. You might have missed a crucial issue. It might look a bit like this:

      use strict; use warnings; use autodie; local $/ = \3000; my $i = "0000"; while (<>) { my $fn = "zz" . $i++; open my $fh, ">:encoding(utf-8)", $fn or die "$fn: $!"; print $fh $_; close $fh; }

      Enjoy, Have FUN! H.Merijn
        OS: Windows 10 Home
        perl 5, version 14, subversion 2 (v5.14.2) built for MSWin32-x86-multi-thread

        This is my script split2.pl
        use strict; use warnings; use autodie; $/=\3000; my$i="000"; while(<>){open my $fh, ">:encoding(utf-8)", "input".$i++.".txt"; print $fh $_; close $fh;}
        If I invoke the script in this way:  perl -CS split2.pl <input.txt
        I obtain this message
        utf8 "\xE1" does not map to Unicode at split2.pl line 11, <> chunk 2. Close with partial character at (eval 21) line 67, <> chunk 2.
        and only the first fragment is created "input000.txt"

        If I run the script without -CS, no warning message and all the files are created. But they include inintelligible characters and not my greek text splitted.

        I can share my greek text (346 kB) but I do not exactly in which way I can do from here.