This example shows how to find, clean, and delete table rows with missing data.
Load sample data from a comma-separated text file, messy.csv. The file contains many different missing data indicators:
Empty character vector ('')
period (.)
NA
NaN
-99
To specify the character vectors to treat as empty values, use the 'TreatAsEmpty' name-value pair argument with the readtable function. (Use the disp function to display all 21 rows, even when running this example as a live script.)
T = readtable('messy.csv','TreatAsEmpty',{'.','NA'}); disp(T)
A B C D E
________ ____ __________ ____ ____
{'afe1'} 3 {'yes' } 3 3
{'egh3'} NaN {'no' } 7 7
{'wth4'} 3 {'yes' } 3 3
{'atn2'} 23 {'no' } 23 23
{'arg1'} 5 {'yes' } 5 5
{'jre3'} 34.6 {'yes' } 34.6 34.6
{'wen9'} 234 {'yes' } 234 234
{'ple2'} 2 {'no' } 2 2
{'dbo8'} 5 {'no' } 5 5
{'oii4'} 5 {'yes' } 5 5
{'wnk3'} 245 {'yes' } 245 245
{'abk6'} 563 {0x0 char} 563 563
{'pnj5'} 463 {'no' } 463 463
{'wnn3'} 6 {'no' } 6 6
{'oks9'} 23 {'yes' } 23 23
{'wba3'} NaN {'yes' } NaN 14
{'pkn4'} 2 {'no' } 2 2
{'adw3'} 22 {'no' } 22 22
{'poj2'} -99 {'yes' } -99 -99
{'bas8'} 23 {'no' } 23 23
{'gry5'} NaN {'yes' } NaN 21
T is a table with 21 rows and five variables. 'TreatAsEmpty' only applies to numeric columns in the file and cannot handle numeric values specified as text, such as '-99'.
View the data type, description, units, and other descriptive statistics for each variable by creating a table summary using the summary function.
summary(T)
Variables:
A: 21x1 cell array of character vectors
B: 21x1 double
Values:
Min -99
Median 14
Max 563
NumMissing 3
C: 21x1 cell array of character vectors
D: 21x1 double
Values:
Min -99
Median 7
Max 563
NumMissing 2
E: 21x1 double
Values:
Min -99
Median 14
Max 563
When you import data from a file, the default is for readtable to read any variables with nonnumeric elements as a cell array of character vectors.
Display the subset of rows from the table, T, that have at least one missing value.
TF = ismissing(T,{'' '.' 'NA' NaN -99});
rowsWithMissing = T(any(TF,2),:);
disp(rowsWithMissing) A B C D E
________ ___ __________ ___ ___
{'egh3'} NaN {'no' } 7 7
{'abk6'} 563 {0x0 char} 563 563
{'wba3'} NaN {'yes' } NaN 14
{'poj2'} -99 {'yes' } -99 -99
{'gry5'} NaN {'yes' } NaN 21
readtable replaced '.' and 'NA' with NaN in the numeric variables, B, D, and E.
Clean the data so that the missing values indicated by code -99 have the standard MATLAB® numeric missing value indicator, NaN.
T = standardizeMissing(T,-99); disp(T)
A B C D E
________ ____ __________ ____ ____
{'afe1'} 3 {'yes' } 3 3
{'egh3'} NaN {'no' } 7 7
{'wth4'} 3 {'yes' } 3 3
{'atn2'} 23 {'no' } 23 23
{'arg1'} 5 {'yes' } 5 5
{'jre3'} 34.6 {'yes' } 34.6 34.6
{'wen9'} 234 {'yes' } 234 234
{'ple2'} 2 {'no' } 2 2
{'dbo8'} 5 {'no' } 5 5
{'oii4'} 5 {'yes' } 5 5
{'wnk3'} 245 {'yes' } 245 245
{'abk6'} 563 {0x0 char} 563 563
{'pnj5'} 463 {'no' } 463 463
{'wnn3'} 6 {'no' } 6 6
{'oks9'} 23 {'yes' } 23 23
{'wba3'} NaN {'yes' } NaN 14
{'pkn4'} 2 {'no' } 2 2
{'adw3'} 22 {'no' } 22 22
{'poj2'} NaN {'yes' } NaN NaN
{'bas8'} 23 {'no' } 23 23
{'gry5'} NaN {'yes' } NaN 21
standardizeMissing replaces three instances of -99 with NaN.
Create a new table, T2, and replace missing values with values from previous rows of the table. fillmissing provides a number of ways to fill in missing values.
T2 = fillmissing(T,'previous');
disp(T2) A B C D E
________ ____ _______ ____ ____
{'afe1'} 3 {'yes'} 3 3
{'egh3'} 3 {'no' } 7 7
{'wth4'} 3 {'yes'} 3 3
{'atn2'} 23 {'no' } 23 23
{'arg1'} 5 {'yes'} 5 5
{'jre3'} 34.6 {'yes'} 34.6 34.6
{'wen9'} 234 {'yes'} 234 234
{'ple2'} 2 {'no' } 2 2
{'dbo8'} 5 {'no' } 5 5
{'oii4'} 5 {'yes'} 5 5
{'wnk3'} 245 {'yes'} 245 245
{'abk6'} 563 {'yes'} 563 563
{'pnj5'} 463 {'no' } 463 463
{'wnn3'} 6 {'no' } 6 6
{'oks9'} 23 {'yes'} 23 23
{'wba3'} 23 {'yes'} 23 14
{'pkn4'} 2 {'no' } 2 2
{'adw3'} 22 {'no' } 22 22
{'poj2'} 22 {'yes'} 22 22
{'bas8'} 23 {'no' } 23 23
{'gry5'} 23 {'yes'} 23 21
Create a new table, T3, that contains only the rows from T without missing values. T3 has only 16 rows.
T3 = rmmissing(T); disp(T3)
A B C D E
________ ____ _______ ____ ____
{'afe1'} 3 {'yes'} 3 3
{'wth4'} 3 {'yes'} 3 3
{'atn2'} 23 {'no' } 23 23
{'arg1'} 5 {'yes'} 5 5
{'jre3'} 34.6 {'yes'} 34.6 34.6
{'wen9'} 234 {'yes'} 234 234
{'ple2'} 2 {'no' } 2 2
{'dbo8'} 5 {'no' } 5 5
{'oii4'} 5 {'yes'} 5 5
{'wnk3'} 245 {'yes'} 245 245
{'pnj5'} 463 {'no' } 463 463
{'wnn3'} 6 {'no' } 6 6
{'oks9'} 23 {'yes'} 23 23
{'pkn4'} 2 {'no' } 2 2
{'adw3'} 22 {'no' } 22 22
{'bas8'} 23 {'no' } 23 23
T3 contains 16 rows and five variables.
Sort the rows of T3 in descending order by C, and then sort in ascending order by A.
T3 = sortrows(T2,{'C','A'},{'descend','ascend'});
disp(T3) A B C D E
________ ____ _______ ____ ____
{'abk6'} 563 {'yes'} 563 563
{'afe1'} 3 {'yes'} 3 3
{'arg1'} 5 {'yes'} 5 5
{'gry5'} 23 {'yes'} 23 21
{'jre3'} 34.6 {'yes'} 34.6 34.6
{'oii4'} 5 {'yes'} 5 5
{'oks9'} 23 {'yes'} 23 23
{'poj2'} 22 {'yes'} 22 22
{'wba3'} 23 {'yes'} 23 14
{'wen9'} 234 {'yes'} 234 234
{'wnk3'} 245 {'yes'} 245 245
{'wth4'} 3 {'yes'} 3 3
{'adw3'} 22 {'no' } 22 22
{'atn2'} 23 {'no' } 23 23
{'bas8'} 23 {'no' } 23 23
{'dbo8'} 5 {'no' } 5 5
{'egh3'} 3 {'no' } 7 7
{'pkn4'} 2 {'no' } 2 2
{'ple2'} 2 {'no' } 2 2
{'pnj5'} 463 {'no' } 463 463
{'wnn3'} 6 {'no' } 6 6
In C, the rows are grouped first by 'yes', followed by 'no'. Then in A, the rows are listed alphabetically.
Reorder the table so that A and C are next to each other.
T3 = T3(:,{'A','C','B','D','E'});
disp(T3) A C B D E
________ _______ ____ ____ ____
{'abk6'} {'yes'} 563 563 563
{'afe1'} {'yes'} 3 3 3
{'arg1'} {'yes'} 5 5 5
{'gry5'} {'yes'} 23 23 21
{'jre3'} {'yes'} 34.6 34.6 34.6
{'oii4'} {'yes'} 5 5 5
{'oks9'} {'yes'} 23 23 23
{'poj2'} {'yes'} 22 22 22
{'wba3'} {'yes'} 23 23 14
{'wen9'} {'yes'} 234 234 234
{'wnk3'} {'yes'} 245 245 245
{'wth4'} {'yes'} 3 3 3
{'adw3'} {'no' } 22 22 22
{'atn2'} {'no' } 23 23 23
{'bas8'} {'no' } 23 23 23
{'dbo8'} {'no' } 5 5 5
{'egh3'} {'no' } 3 7 7
{'pkn4'} {'no' } 2 2 2
{'ple2'} {'no' } 2 2 2
{'pnj5'} {'no' } 463 463 463
{'wnn3'} {'no' } 6 6 6
fillmissing | ismissing | readtable | rmmissing | sortrows | standardizeMissing | summary