Read specific hex data in CSV file

24 views (last 30 days)
Adam Kaas
Adam Kaas on 4 Jun 2014
Commented: Cedric on 5 Jun 2014
I've looked through the posts on StackOverflow and on MATLAB Answers and can't seem to find the answer I am looking for. I have a large CSV file (450 MB) with hex data that looks like this:
63C000CF,6000002F,603000AF,6000C06F,617300EF,6C7C001F,6000009F,0%,63C000CF...
That is a very truncated example, but basically I have approximately 78 different hex values separated by commas, then there will be the '0%', then 78 more hex values. This will continue for a very long time. I've been using textscan like this:
data = textscan(fid, '%s', 1, 'delimiter', '%');
data = textscan(data{1}{1}, '%s', 'delimiter', ',');
data = data{1};
count = size(data);
outstring = ['%', sprintf('\n')];
for idx = 1:count(1)
string = data{idx};
stringSize = size(string);
if stringSize(2) > 1
outstring = [outstring, string, sprintf('\n')];
end
end
fprintf(output_fid, '%s', outstring)
This allowed me to format the csv file in a way to which I could use fgetl() to analyze whether or not I was looking at the data I needed. Because the data repeats itself, I can use fseek() to jump to the next occurrence before calling fgetl() again.
What I need is a way to skip to the ending. I want to just be able to use something like fgetl() but have it only return the first hex value it encounters. I will know how many bytes to shift through the file. Then I need to make sure I can read other hex values. Is what I'm asking possible? My code using textscan above takes far too long on a csv file that is 90 MB let alone 450 MB.
  6 Comments
dpb
dpb on 4 Jun 2014
I know you know what you're after, but we can only go by what is revealed here. Don't be so terse; over-explain rather than under-...
...Each set of hex values represents a label
What's a "set" in this context? A single value or all the values of a given offset relative to the beginning/the flag value? Or is it the entire group between the flag values?
Is "one label" above a single 16-bit hex value or again all of the same offset or the group at a the location of the indicated flag value? Have to have a precise definition of what it really is you're after.
How is/are the one(s) wanted identified?
What is the function of the indicator
Adam Kaas
Adam Kaas on 4 Jun 2014
Edited: Adam Kaas on 4 Jun 2014
I apologize for not being thorough in my explanation.
A set of hex values represents an 8 character hex value (the values separated by commas), i.e. 63C000CF. One label is one set. We define them by the last two characters in the hex value, i.e. 63C000CF is label CF.
The labels chosen by the user are selected from a list of all available labels. This list is populated into a GUI in a separate function. Using the values from my example that list would be labels CF, 2F, AF, 6F, EF, 1F, and 9F. The user can, just as an example, select labels CF and AF and then I would need to go through the CSV file, find my first CF label and store the data contained, then move to the next CF (which will be a set number of bytes away in the file) and record that data until the end of the file is reached. Then I would repeat the process for the AF label.
If it is relevant, we do have names associated with the labels and don't actually refer to them as label CF. The label number is calculated in a strange way due to the way the data is transmitted, but essentially label CF would be label 363 (change CF to binary, flip it, that is the octal label). The user will know what kind of data is represented by that label.

Sign in to comment.

Accepted Answer

Cedric
Cedric on 4 Jun 2014
Edited: Cedric on 4 Jun 2014
NEW solution
Here is a more efficient solution; I am using a 122MB file, so you have an idea about the timing
% One line for reading the whole file. To perform once only.
tic ;
content = fileread( 'adam_1.txt' ) ;
fprintf( 'Time for reading the file : %.2fs\n', toc ) ;
% One line for defining an extraction function. To perform once only.
extract = @(label) content(bsxfun( @plus, ...
strfind( content, [label,','] ).' - 6, ...
0 : 5 )) ;
% Then it is one call per label to extract data.
tic ;
data = extract( 'CF' ) ;
fprintf( 'Time for extracting one label: %.2fs\n', toc ) ;
Running this, I obtain
Time for reading the file : 0.52s
Time for extracting one label: 0.62s
FORMER solution
Would the following work for you?
% Read file content. To do once only.
content = fileread( 'myFile.txt' ) ;
% Define regexp-based extraction function. To do once only.
getByLabel = @(label) regexp( content, sprintf( '\\w{6}(?=%s)', label ), ...
'match' ) ;
% Get all entries for e.g. label 'CF'.
entries_CF = getByLabel( 'CF' ) ;
% Get all entries for e.g. label '6F'.
entries_6F = getByLabel( '6F' ) ;
I am not completely clear on what you need to achieve ultimately; if I had to design a GUI where users can choose a label and get corresponding data, I would process the data much further during the init phase, e.g. by grouping them by label in a cell array. Regexp is not the most efficient approach in this case I guess, but the principle would be..
labels = {'CF', '6F', 'AF', ..} ;
nLabels = numel( labels ) ;
data = cell{ 1, nLabels ) ;
for lId = 1 : nLabels
data{lId} = getByLabel( labels{lId} ) ;
end
and then when a user selects 'CF' ..
lId = strcmpi( label, labels ) ;
dataForThisLabel = data{lId} ;
  6 Comments
Adam Kaas
Adam Kaas on 5 Jun 2014
Thanks Cedric! I've been playing with the regexp and it has been proving to be faster. I'll work on implementing your new solution. I appreciate your help!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!