Find length of contiguous numbers in a vector that sum to a given value

5 views (last 30 days)
I have a column vector, and I need to look within this vector and find a run of contiguous numbers which sum to a specific value. I want the code to simply return the lengths of the contiguous values it needed to add up to reach that target value.
The code below (with help from Image Analyst) does the job but for the vectors I'm dealing with (length = ~1 x 10^6) it is very slow, and I can't help feeling there must be a quicker way. The way the code works is that it sets the length first, then searches the vector for contiguous numbers that add up to the desired length. This needs to be done for all possible lengths, hence the for loop. I suspect a better way of doing this is to do all the sums first, then work out the lengths, but I'm not sure how to implement this.
There are a couple of complicating issues. The first is that the vectors are random, non-integer numbers, so the value I search for is actually a range (+/- 0.1). There are no negative values. The second issue is that there are zeros in the vector, and I want to ignore any contiguous series that begin or end with a zero.
As I say, the code below does exactly what I want, but the whole thing needs speeding up and I'm not sure how this can be done.
% P = my vector
scale=1:1:1000000;
ni = length(scale);
for K = 1 : ni
i = scale(K);
sums = conv(P, ones(1, i), 'valid');
%ignore contiguous runs that begin or end with a 0
P0=P==0;
ignore = or(P0(1:end-i+1), P0(i:end));
sums(ignore) = [];
%calculate number of contiguous runs of length K that sum to 20 (+/- 0.1)
countofsums(K) = nnz(sums>19.9 & sums<20.1);
end
clearvars -except countofsums scale P
  1 Comment
Swisslog
Swisslog on 25 Feb 2014
I guess another approach could be to somehow vectorise the whole thing, can anyone see whether this could be done?

Sign in to comment.

Answers (2)

Roger Stafford
Roger Stafford on 25 Feb 2014
Swisslog, it seems to me you have overlooked the importance of the following very significant aspect of your problem: " There are no negative values. " This means that if you have summed a certain span of contiguous values, when you include another value at its upper end, the sum is guaranteed not to decrease, and when you drop the value at the beginning the sum is guaranteed not to increase. This is a very important consideration in reducing the total number of possible sums that need to be examined. It is a great waste of time having to calculate the sum of every possible span of numbers in your vector with that convolution operation.
I don't have the time to work this out for you, but why don't you see if you can exploit this non-negative property of your vector in the following way. Suppose you maintain two pointers to your vector values, one at the beginning of a span of values and one at its end. Also you continually maintain the sum of the values spanned. If the sum is less than 19.9, at the next step you advance the end pointer by one and add the next value to your sum. If the sum is greater than 20.1, at the next step you advance the beginning pointer by one and reduce the sum by that first value. If the sum lies between these two limiting values, you record this particular case in 'countofsums(K)' where K would be the difference in the pointer numbers. That means it takes only a single pass through the entire vector. The exception to that statement is that if you have found a sum within your limits, you may have to bring the end pointer back to an earlier value after advancing the beginning pointer so as to not miss sum span combinations that also fall within the limits.
Presumably this can all be accomplished with a fairly smart while-loop which repeatedly advances either the beginning pointer or the end pointer while all the time maintaining the current sum included between the pointers. It continually tests the sums to catch the ones that need to be recorded, carefully avoiding those that begin or end with a zero.
Have fun!
  1 Comment
Swisslog
Swisslog on 26 Feb 2014
Edited: Swisslog on 26 Feb 2014
Thanks Roger for this very insightful input. You are of course correct that this approach would be eminently more efficient than what I have so far. In reality, I don't think my coding skills are up to it, and the newest code I posted does work at a speed sufficient for my needs (plus I understand it).
With that in mind, I think all I really want now is to add to this existing code a way to ignore any contiguous runs that begin or end with 0

Sign in to comment.


Jos (10584)
Jos (10584) on 25 Feb 2014
You can get rid of the slow CONV statement by using the sum of K consecutive elements in creating the sum of K+1 consecutive elements. This is likely to speed things up.
Secondly, once there are no more sums that sum up to a value for K consecutive elements (but did for K-1 elements) we can discard larger values of K.
Here is some code that implement these ideas. Note that K runs from 1 to the number of elements without gaps
% data
V = [1 5 2 3 7 4 1 1 2 5]
CompareValue = 6 ;
Tolerance = 2 ;
maxLength = 5 ; % value between 1 and numel(V)
numV = numel(V)
sumVk = V ; % sum over k consecutive elements, start with k = 1 till maxLength
count = zeros(1,maxLength)
% engine
% start with length of 1
D = abs(sumVk-CompareValue) < Tolerance
count(1) = nnz(D)
% look at the subsequent lengths
for k = 2:maxLength
ix = k:numV ;
sumVk(ix) = sumVk(ix) + V(ix-k+1)
% sumVk(k:numV) is the part that holds the sum of k consecutive values of V
% elements between 1 and k can be ignored
D(ix) = abs(sumVk(ix)-CompareValue) < Tolerance % compare to value
count(k) = nnz(D(ix)) % count
if count(k-1) && ~count(k)
break ; % we can skip larger values of k
end
end
  2 Comments
Swisslog
Swisslog on 25 Feb 2014
Thanks Jos, that's great. I think it gives the same answer to this code that I've since put together based on an earlier related question.
V = [1 5 2 3 7 4 1 1 2 5];
scale = 1:1:10;
ni = length(scale);
countofsums=zeros(ni,1);
tic
c_data = cumsum(V);
for K = 1 : ni
sums = c_data(scale(K):end) - [0 c_data(1:end-scale(K))];
countofsums(K) = sum(sums>4 & sums<8);
end
toc
clearvars -except countofsums scale data
Both seem quite speedy, but a key bit that's missing is the condition that ignores contiguous runs of numbers that begin or end with zero. Can you or anyone else see how this could be implemented?
Equally, I don't quite follow the break bit of the code, can this be implemented in my own code above?
Swisslog
Swisslog on 25 Feb 2014
Can anyone assist further with getting this code sorted with the conditions outlined above?

Sign in to comment.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!