Trying to sort a text document by alphabetical order and count how many times a word appears
3 views (last 30 days)
Show older comments
Hello,
I am trying to take a .txt file, import it, and then count how many times a word appears and then how sort all the words in order.
I want to take the .txt file, make it into a string, chop the string up into individual words, and then put the words into a matrix along with there word count.
Take the following sentance as an example, "The cat is a cat and would like to have a cat."
The output would look like the following:
word:a count:2
word:and count:1
word:cat count:3
word:have count:1
word:like count:1
word:the count:1
word:to count:1
word:would count:1
Here is what I have right now.
fid = fopen('Theodore_Roosevelt_The_Duties_Of_American_Citizenship.txt');
Line = fgetl(fid);
textfile = strings(1,1);
k = 1;
while ischar(Line) textfile(k,1) = Line;
Line = fgetl(fid);
k = k+1;
end
fclose(fid);
%removing all nonAlpha characters from the text file
punc = [".",";",",","(",")","--","-",];
textfile = replace(textfile,punc," ");
textfile=lower(textfile);
%for loop is used to split string 'textFile' into individula words
words = strings(0);
for i = 1:length(textfile)
words = [words;split(textfile(i))];
end
k=convertStringsToChars(words);
h=1;
F=2;
G=1;
sorted=zeros(1);
for j = 1:length(k)
T=strncmp(k(h),k(F),1);
if T==1
%if true, put h word in sorted before word F
sorted=sorted(k(h),G);
G=G+1;
sorted=sorted(k(F),G);
end
h=h+2;
F=F+2;
end
disp(sorted)
This is the error I get when executing the code:
Error using sort
Input argument must be a cell array
of character vectors.
Error in sorted (line 28)
[ignored,index] =
sort([meshsites(:).' sites(:).']);
Error in fgetltest (line 35)
sorted=sorted(k(h),G);
This is for a homework question, but I am lost about this matrix part I want to put it in.
3 Comments
Accepted Answer
Adam Danz
on 3 Nov 2019
Edited: Adam Danz
on 3 Nov 2019
Here's a different approach. See inline comments for details.
% Read text
C = fileread('Theodore_Roosevelt_The_Duties_Of_American_Citizenship.txt');
% Split into words by spaces
words = strsplit(strtrim(C));
% Remove problematic characters
% But be careful: this removes any non-letter from each word.
% cat's turns into cats; But without this "cats" with quotes
% or cats! will not be recognized. If that's a problem you'll
% need to use a regular expression approach.
words = cellfun(@(x)x(isletter(x)), words, 'UniformOutput', false);
% make all letters lower case
words = lower(words);
% sort them into alphabetical order
words = sort(words);
% Count frequency of each word
wordList = unique(words);
wordCount = histcounts(categorical(words), categorical(wordList));
% Output table
T = table(wordList(:), wordCount(:), 'VariableNames', {'Word', 'Count'});
6 Comments
Adam Danz
on 3 Nov 2019
Glad I could help!
The 2 mini lessons here are the use of fileread() and histcounts().
More Answers (0)
See Also
Categories
Find more on Data Type Identification in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!