UPDATE
The suggested use of JSON::XS#INCREMENTAL-PARSING looks like the much better solution for such kind of parsing problems. graff++
> If there is a good separator like blank line(s), you can set the input record separator $/ accordingly.
You didn't show us more than one object and your link is broken. So I had to guess they all start with '{"created_at":' at line's start.
And I shortened your objects for better demonstration
use v5.12.0;
use warnings;
use Data::Dump;
use JSON::XS;
my $ident ='{"created_at":';
local $/ = "\n$ident";
my $prefix="";
while (<DATA>) {
chomp; # removes $/ from the end
my $obj = "$prefix$_";
ddx $obj;
say "-" x 30;
dd decode_json($obj);
#say "-" x 72;
$prefix = $ident;
}
__DATA__
{"created_at":"Sat Mar 02 18:45:26 +0000 2013","id":307924626426695681
+,"id_str":"307924626426695681","etc":"***YADDA YADDA ***","id_str":"2
+621098741943851970"}
{"created_at":"Sat Mar 02 18:45:26 +0000 2013","id":307924626426695681
+,"id_str":"307924626426695681","etc":"***YADDA YADDA ***","id_str":"2
+621098741943851970"}
{"created_at":"Sat Mar 02 18:45:26 +0000 2013","id":307924626426695681
+,"id_str":"307924626426695681","etc":"***YADDA YADDA ***","id_str":"2
+621098741943851970"}
# raw_json.pl:16: "{\"created_at\":\"Sat Mar 02 18:45:26 +0000 2013\",
+\"id\":307924626426695681,\"id_str\":\"307924626426695681\",\"etc\":\
+"***YADDA YADDA ***\",\"id_str\":\"2621098741943851970\"}"
------------------------------
{
created_at => "Sat Mar 02 18:45:26 +0000 2013",
etc => "***YADDA YADDA ***",
id => 307924626426695681,
id_str => 2621098741943851970,
}
# raw_json.pl:16: "{\"created_at\":\"Sat Mar 02 18:45:26 +0000 2013\",
+\"id\":307924626426695681,\"id_str\":\"307924626426695681\",\"etc\":\
+"***YADDA YADDA ***\",\"id_str\":\"2621098741943851970\"}"
------------------------------
{
created_at => "Sat Mar 02 18:45:26 +0000 2013",
etc => "***YADDA YADDA ***",
id => 307924626426695681,
id_str => 2621098741943851970,
}
# raw_json.pl:16: "{\"created_at\":\"Sat Mar 02 18:45:26 +0000 2013\",
+\"id\":307924626426695681,\"id_str\":\"307924626426695681\",\"etc\":\
+"***YADDA YADDA ***\",\"id_str\":\"2621098741943851970\"}\n"
------------------------------
{
created_at => "Sat Mar 02 18:45:26 +0000 2013",
etc => "***YADDA YADDA ***",
id => 307924626426695681,
id_str => 2621098741943851970,
}
| [reply] [d/l] [select] |
> those JSON strings never include "\n"
from what I understand code for line-breaks - i.e. CR LF and alike - are not allowed inside JSON strings. They must be represented by \n for newline (resp. \\n depending on escape rules of the source language) .
BUT line-breaks are allowed outside strings between elements of your object. They are just for formatting and ignored. (hence you don't need to chomp them either)
So assuming that one object is always inside one line depends on the source.
| [reply] [d/l] [select] |