How can I read a huge json file (0.5GB)
12 views (last 30 days)
Show older comments
I have a huge json file. I tried using the "fileread" command but I get an "Out of Memory" message. I have 4 GB of RAM on my computer. Can anyone suggest a solution?
Thanks,
Meir
0 Comments
Answers (2)
Guillaume
on 28 Apr 2019
The problem is that your computer simply doesn't have enough memory.What does memory says after you've simply started matlab?
Have you tried jsondecode? Despite its flaws (*) it's certainly more suited than fileread but I suspect it would still run out of memory since you don't appear to have enough memory to hold the whole file in memory, let alone its decoded content.
*: I got mathworks to fix several issues with their json decoding when it was first implemented but one flaw still remaining is that it will irreversibly mangle object propery names that are not valid variable names or more than 64 characters.
1 Comment
Jan
on 29 Apr 2019
While fileread requires a contigious block of 1 GB (two bytes per charatcer in the file), parsing the JSON string will split the data to several junks, which need not be store as a contiguous block. But maybe the JSON file contains one big matrix of numerical data, which are stored with 3 characters and a separator. Then the parsing creates a matrix with 8 bytes per element (if double is used), such that this needs much more RAM than fileread.
The only reliable way to import a large JSON file is to offer enough RAM. Therefore: +1
Jan
on 25 Apr 2019
Edited: Jan
on 25 Apr 2019
If you use fileread, the 0.5 GB of bytes are converted to a char vector, which occupies 1 GB of RAM, because Matlab uses 2 Byte per CHAR. You do not have 1 GB of free RAM in a contigous block. You can import the file to a cell string, but this will need more RAM due to the overhead of about 100 Bytes for each line of text. But the memory does not need to be free in one contigous block for a cell string. The pre-allocation is not trivial, but you can do it in blocks of e.g. 1000 lines:
function C = readtextfile(FileName)
[fid, msg] = fopen(FileName, 'r');
if fid == -1
error(msg);
end
nC = 1000;
iC = 0;
C = cell(1, nC);
while ~feof(fid)
s = fgetl(fid);
if ischar(s)
iC = iC + 1;
if iC > nC
nC = nC + 1000;
C{nC} = []; % Expand the cell
end
C{iC} = s;
end
end
fclose(fid);
C = C(1:iC); % Crop unused cell strings
end
This will take a while. The result will need more than 1 GB iof RAM, but not in a contiguous block. So maybe the import works. But as soon as you want to process the data, you will need more memory. So the only reliable solution is to install more RAM, or to import only a specific section of the data.
7 Comments
Jan
on 29 Apr 2019
... 16 bytes for the data plus about 100 bytes for the header of the variable. A 0.5 GB file can contain a lot of variables, wuch that the overhead migth matter.
Walter Roberson
on 29 Apr 2019
True. Success would depend upon whether there are big data chunks or a number of small variables -- though perhaps in context it would turn out to make sense to bundle the small variables into arrays.
More RAM wouldn't hurt...
See Also
Categories
Find more on JSON Format in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!