Appending to the field of a structure array

13 views (last 30 days)
Hello, I am trying to create a word index. I start off with an empty cell array with 3 fields: Word, Documents, and Locations. For now ignore the latter two. I have a cell array with words
Doc1 = {'Matlab','is','awesome'};
To avoid confusion, there are other documents that have the same word. I want to take my Index, which I created a function for here
function Index = InitializeIndex()
c10 = cell(1,0);
Index = struct('Word', c10, 'Documents', c10, 'Locations', c10);
I want to add the unique words into Index, so here is my function.
function Index = InsertDoc(Index, newDoc, DocNum)
% This function will be a struct array where each element corresponds to a
% unique word in a group of documents. In each element of the struct array
% the word is stored in the Word field, the document numbers that the word
% is contained is in the documents field, and the locations of the word in
% each document is in the Location field.
Index = {Index.Word};
for i = 1:numel(newDoc)
% IndexWord is either empty or the word is not present in IndexWord
if isempty(Index) || strcmpi(Index{i},newDoc(i))
Index.Word{end+1} = newDoc(i);
end
end
My problem is twofold. First, I am having difficulty with my condition regarding the word being unique in index. How do I make it so that it knows if the word does not exist in index, then append? The second question is how do I actually append the word into the word field of Index?

Answers (2)

Alfonso Nieto-Castanon
Alfonso Nieto-Castanon on 5 Jul 2014
Edited: Alfonso Nieto-Castanon on 5 Jul 2014
Assuming that you want Index to be a single struct with fields Words/Documents/Locations (each of the fields being a cell array), then you could do something along these lines:
UniqueWordsInDoc = unique(newDoc); % unique words
in = ismember(UniqueWordsInDoc,Index.Word); % words already in Index
idx = numel(Index.Word)+(1:nnz(~in)); % new Index entries
Index.Word(idx) = UniqueWordsInDoc(~in); % adds new words
If, on the other hand, you want Index to be a struct array with fields Words/Documents/Locations (each of the fields being a string or vector), then you could do something along these lines:
UniqueWordsInDoc = unique(newDoc); % unique words
in = ismember(UniqueWordsInDoc,{Index.Word}); % words already in Index
idx = numel(Index)+(1:nnz(~in)); % new Index entries
[Index(idx).Word] = deal(UniqueWordsInDoc{~in}); % adds new words
In the former case you initialize using:
Index = struct('Word',{{}});
while in the latter you would initialize Index using:
Index = struct('Word',{});
I hope this clarifies the "indexing" issues, this can be kind of tricky...
EDIT1: added correction by Cedric
EDIT2: "concatenating" versions, something along these lines:
in case 1:
UniqueWordsInDoc = unique(newDoc); % unique words
in = ismember(UniqueWordsInDoc,Index.Word); % words already in Index
Index.Word = [Index.Word UniqueWordsInDoc(~in)]; % adds new words
in case 2:
UniqueWordsInDoc = unique(newDoc); % unique words
in = ismember(UniqueWordsInDoc,{Index.Word}); % words already in Index
Index = [Index cell2struct(UniqueWordsInDoc(~in),'Word')']; % adds new words
  7 Comments
Rick
Rick on 5 Jul 2014
Here is my code right now
function Index = InsertDoc(Index, newDoc, DocNum)
% This function will be a struct array where each element corresponds to a
% unique word in a group of documents. In each element of the struct array
% the word is stored in the Word field, the document numbers that the word
% is contained is in the documents field, and the locations of the word in
% each document is in the Location field.
for i = 1:numel(newDoc)
% IndexWord is either empty or the word is not present in IndexWord
if isempty(Index)|| strcmpi({Index.Word},newDoc{i})
Index(end + 1).Word = newDoc{i};
end
end
Here is my input
Doc1 = {'Matlab','is','awesome'};
E7 = InitializeIndex;
E7 = InsertDoc(E7,Doc1,1);
and my output was not what I expected. I expected E7(2) to be 'is'.
EDU>>E7(1)
ans =
Word: 'Matlab'
Documents: []
Locations: []
EDU>> E7(2)
Index exceeds matrix dimensions.
Alfonso Nieto-Castanon
Alfonso Nieto-Castanon on 5 Jul 2014
Edited: Alfonso Nieto-Castanon on 5 Jul 2014
change
strcmpi({Index.Word},newDoc{i})
to
~any(strcmpi({Index.Word},newDoc{i}))

Sign in to comment.


the cyclist
the cyclist on 5 Jul 2014
For the first part, use the ismember() command. For the second part, you can just append using
new_list = {old_list,new_word};
  3 Comments
the cyclist
the cyclist on 5 Jul 2014
Actually, I think I misunderstood what you meant. If you already had
Index(1).Word = 'cat';
then you can append with
Index(end+1).Word = 'dog';
Rick
Rick on 5 Jul 2014
Edited: Rick on 5 Jul 2014
So do you mean this?? I got rid of Index = {Index.Word} because that is overwriting my function InitializeIndex
for i = 1:numel(newDoc)
% IndexWord is either empty or the word is not present in IndexWord
if isempty(Index)|| strcmpi({Index.Word},newDoc{i})
Index(end + 1).Word = newDoc{i};
end
end
I get the following problem. When I type Index(2).Word, I get 'Index exceeds matrix dimensions.'

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!