standard deviation takes for ever
Show older comments
I have a double precision numeric 3D matrix M (converted by fread from uint8) of size 30000 x 500 x 500 I would like to get standard deviation along dimension 2 tic, std(M,0,2) ; toc has taken more than 12 hours and still running meanwhile mean(M,2) only took 80 seconds.
Or a bit more details.. std(M(:,:,1),0,2) takes 0.3 seconds and std(M(:,:,1:100),0,2) takes 34 seconds But std(M(:,:,1:500),0,2) says out of memory
Similarly mean(M(:,:,1),2) takes 0.1 seconds But mean(M(:,:,1:500),2) does not work and gives me 'out of memory' message But mean(M,2) takes about 80 seconds. This is all very confusing! Thanks
7 Comments
NAVNEET NAYAN
on 12 Sep 2023
That might be occuring due to the size of 3D matrix and your system memory.
gujax
on 12 Sep 2023
Dyuman Joshi
on 12 Sep 2023
"std(M(:,:,1),0,2) takes 0.3 seconds and std(M(:,:,100),0,2) takes 34 seconds
But std(M(:,:,500),0,2) says out of memory"
This is peculiar. How are you running these timing tests?
30000*500*500*8/(1024*1024*1024)
is an array of 56 GB; how much real memory does your machine have? The high run times are likely owing to disk paging referring to locations in the original array and working on the second dimension means not accessing memory in sequence but by steps of the size of the first dimension for each subsequent row.
You could try
M=M.';
std(M(:,:,100)).'
Speed should be better if did
std(squeeze(M(:,:,100)),0,2)
etc., ... but that may force a memory copy and cause memory issues, I don't know.
I don't have anything close to enough memory to even try...
dpb
on 12 Sep 2023
Your original posting says "I have a double precision numeric 3D matrix M of size 30000 x 500 x 500..."
That's what I calculated above at 8 bytes/double takes up 59 GB storage.
I don't follow what " an accumulation of (500 x 100x 5) files each 31 KB in size." means?
Think you're going to have to show us specifically what your array is and how it was constructed.
Accepted Answer
More Answers (1)
Can you confirm you're using the std function included in MATLAB? What does this command show?
which -all std
9 Comments
gujax
on 12 Sep 2023
Steven Lord
on 12 Sep 2023
You don't see any other std.m files in the list?
dpb
on 12 Sep 2023
Presuming the answer to @Steven Lord's followup Q? is "No", then you've not accidentally aliased it as he was looking for (and wouldn't expect it likely to have done with the same input footprint as the original, but always worth checking), then for a double as input the base datafun one is the one that will be called; the others are overloaded versions for the specific data types/classes noted in the comments.
Dyuman Joshi
on 12 Sep 2023
If your data is uint8 why not just use uint8? It will help with the memory, uint8 requires only 1 byte for storage, compared to 4 byte for single and 8 byte for double.
"I thought going from double to single would only affect speed by 2x?"
What makes you think so?
Dyuman Joshi
on 12 Sep 2023
Edited: Dyuman Joshi
on 12 Sep 2023
Ah yes, Idk how I overlooked the obvious, My bad.
dpb
on 13 Sep 2023
The issue you're having must be in disk swapping owing to limited real memory...I'm still not positive about just how big your array is. How about
whos M
? to tell us precisely what you've processing and
memory
for the available memory your machine has?
It depends on how TMW builds the executable and what processor instructions they assume; unfortunately, it's likely they code to a "lower common denominator" of what is out there because know that not all customers are going to have latest CPU technology with enhanced vector processing instructions making use of builtin vector pipeline that exists with current processors.
I've never messed with trying it out, if you have a high-memory graphics card, you could possible try the GPU stuff...
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!