sequence probability in a vector

4 views (last 30 days)
simon
simon on 23 May 2014
Answered: the cyclist on 24 May 2014
if i have a 200 points vector that contains numbers 1 to 4 like [1 2 4 1 3 2 1 4 2 1 3..., how can i calculate a matrix that will give me the probability of a 2 digit sequence? i.e. a 1 has 34% chance of being followed by a 2. the matrix has to show all combination 1 2, 1 3, 1 4, 2 1, 2 3...

Accepted Answer

Star Strider
Star Strider on 23 May 2014
Here’s part of the solution:
V = randi(4, 1, 200); % Create data
T = zeros(4); % Preallocate ‘Tally’ matrix
for k1 = 1:length(V)-1
T(V(k1),V(k1+1)) = T(V(k1),V(k1+1)) + 1;
end
The idea is to tally each occurrence, so the number of times a 1 is followed by a 1 is T(1,1), the number of times a 3 is followed by a 4 is T(3,4), etc. The loop moves sequentially through the vector, so in your example it would consider [1 2], [2 4], [4 1], ...
I will leave you to calculate the probabilities.

More Answers (2)

Geoff Hayes
Geoff Hayes on 23 May 2014
Since your 200 points vector contains four distinct numbers (1-4) then I would consider creating a 4x4 matrix where element (i,j) is the number of times that j follows i in your 200 points vector (I'm assuming that you are computing your probabilities based on that vector). So:
seqCount = zeros(4,4); % the count of all 2-digit sequences initialized to zero
Then knowing the first digit in the sequence is a 1 and the second digit is a 2, then
seqCount(1,2) = seqCount(1,2) + 1; % increment this 2-digit sequence by one
Of course, you will want to make this more general and iterate over each element in the 200 point vector keeping track of the previous element, i, and the current element j:
seqCount(i,j) = seqCount(i,j) + 1;
Finally to compute the probabilities of each 2-digit sequence you can just divide the seqCount matrix by the number of 2-digit sequences in your vector (type help sum to see how you could get this count from seqCount).

the cyclist
the cyclist on 24 May 2014
Another possibility:
x = [1 2 4 1 3 2 1 4 2 1 3];
p_rel = zeros(4,4);
for i=1:4,
for j=1:4,
p_rel(i,j) = sum(strfind(x,[i j])~=0)/(numel(x)-1);
end
end
p_abs = bsxfun(@rdivide,p_rel,sum(p_rel,2))
The (i,j) element of p_abs gives the probability that i is followed by j.
I assumed that you did not want to include the final 3, which cannot be followed by anything.

Categories

Find more on Creating and Concatenating Matrices in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!