Best way to read large text files (over 2 mil rows) into matlab?

9 views (last 30 days)
I need to read in a .csv file with 4 columns and over 2 million rows. The columns consist of a 3 row header followed by 50,000 numerical values; this pattern of the header followed by the 50,000 numbers repeats hundreds of times within the same columns until i have over 2 million rows worth of data.
What is the fastest and most efficient way to read these columns into matlab? It isn't a big deal if the cells that contain strings get read in as NaN, i can always fix that after the file has been read in.
The code that i am currently using to try and read in the data (seen below) is taking over 3 hours and it completely freezes my computer while it is computing.
filename = 'input.csv';
delimiter = ',';
startRow = 1;
%%Read columns of data as strings:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%s%s%s%s%[^\n\r]';
%%Open the text file.
fileID = fopen(filename,'r');
%%Read columns of data according to format string.
% This call is based on the structure of the file used to generate this
% code. If an error occurs for a different file, try regenerating the
% code from the Import Tool.
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter ...
, 'HeaderLines' ,startRow-1, 'ReturnOnError', false);
%%Close the text file.
fclose(fileID);

Answers (1)

per isakson
per isakson on 8 Aug 2014
Edited: per isakson on 9 Aug 2014
The file consists of many blocks of header-lines followed by numerical data(?). There is no high-level function in Matlab, which read your file.
  • "50,000 numbers"&nbsp translates to 12,500 rows?
  • the entire file as one string variable in Matlab will be approx. 0.2GB
  • the numerical data converted to double will be less than 0.1GB
That should fit comfortably in memory.
&nbsp
I think that "fastest and most efficient way" is
  1. read the entire file to one string variable
  2. split the string into sub-strings, which contains header-lines followed by numerical data
  3. parse the sub-strings with textscan
To fill in the details requires more info on the format of the file.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!