How can I read from a file into a char array?

83 views (last 30 days)
I have a large text file, and I need to calculate the number of times each individual letter occurs in the file. The easiest way I can think of to do that would be to have an array where each entry is a single char from the file, then run an array function on the whole thing and sum the number of times each letter is found. However I am having trouble getting the text from the file into a char array. I have tried using fileread, which reads the entire file to a single entry in a string array, and I have tried using textscan, which reads the file into a cell array split up by words. Does anyone know if I can just get the file straight into a char array?
  2 Comments
John
John on 28 Sep 2014
Edited: John on 28 Sep 2014
When you use fileread to read the text in a file you actually get a char array.
Let's say testfile.txt contains the text:
this is a test file
If you use fileread like this:
fileContents = fileread('testfile.txt')
fileContents will be a char array with the individual characters. Check that that is so with:
class(fileContents) %Should echo 'char'
isvector(fileContents) %Checks if fileContents is a vector, should return 1/true
The overall problems seems like a college homework assignment :-) so I will refrain from providing a solution. There are a couple of ways to do keep a count of each character in the char array. One way would be to keep count of the characters you encounter while iterating through the char array in a Map container, where the keys are the individual characters and the values are the populations of the unique characters in the char array.
Also, the unique function provides a pathway to another solution.
Zachary
Zachary on 29 Sep 2014
I re checked my code, and I was completely mistaken, readfile, does in fact give me an array of chars. I had tried vectorizing my code, which kind of still seems like magic to me, and I guess I was incorrectly accessing my data. Thanks!

Sign in to comment.

Accepted Answer

per isakson
per isakson on 28 Sep 2014
Edited: per isakson on 29 Sep 2014
Try
str = fileread( filespec );
num = double( str );
nch = histc( num, [1:255] ); % fix [32:255]
A little test - added later
>> char( find( histc( double('abcd1234'), [1:255] ) ) )
ans =
1234abcd

More Answers (1)

Geoff Hayes
Geoff Hayes on 28 Sep 2014
Zachary - I think that you are on the right path using fileread. If I follow the fileread example,
io_contents = ...
fullfile(matlabroot,'toolbox','matlab','iofun','Contents.m');
filetext = fileread(io_contents);
Note that filetext is a 1x4244 array of char elements. So you can either loop over each element and update your "counting" array, or try something else. Remember that each character has an ASCII code, so we could use that to our advantage. If we convert the character array into a numeric array, we could then use a histogram function (for example histc) to determine the counts for each character
charBinCounts = histc(double(uint8(filetext)),0:1:127);
So we take the 1x4244 character array filetext and then convert it to the 8-bit unsigned integers and convert to double (I needed to do both conversions because of histc). Then pass this numeric array to the histc function with the bins given by 0,1,2,...,126,127 (since unsigned 8-bit integers have values from 0 through to 127).
charBinCounts contains the counts for each character.
  1 Comment
Zachary
Zachary on 29 Sep 2014
Thanks! I appreciate you taking the time to help me out with this.

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!