How can I plot mixed data types of a csv file ?
2 views (last 30 days)
Show older comments
I'm working on a .csv file that contains one column of mixed data types , the coulmn looks something like:
,"Admit","Gender","Dept","Freq"
1,"Admitted","Male","A",512
2,"Rejected","Male","A",313
How can I read the data from the file and plot them in a reasonable way to determine appropriate features for a classification problem ?
I've tried to use txt2str , csvread , and textscan functions beside some file format conversion functions as well (xlsread ,[num,txt,raw] ) and non of them worked with my data!!
Any good suggestions would be appreciated .
1 Comment
dpb
on 17 Apr 2014
If you have the Statistics Toolbox, create a nominal variable array from the non-numeric fields
doc nominal % and friends
If you don't have the toolbox, convert manually. unique could be of interest, perhaps to create the number of categories for each.
After the above, then you can do something with those values.
Accepted Answer
Image Analyst
on 20 Apr 2014
You should use the new readtable() function, available in R2013b and later. It can handle different data types. It goes like
t = readtable(fullFileName);
If you want to separate the values, you can do that:
Admit = t.Admit;
Gender = t.Gender;
and so on.
3 Comments
Image Analyst
on 20 Apr 2014
I have no problem with the new interface and I like the new version R2014a. I guess I'd gotten used to the ribbon toolbar from a year or so of using Microsoft Office first. I think the new version has several things that are very nice and worth upgrading for. It has the new table data type, which is just fantastic. Plus it has a better way for selecting editor windows when you have lots of them open, like I usually do, and there's too many to show along the top. The command line intellisense is also improved to popup a list of prior commands right there in the command window if you type a letter and hit the up arrow key, so you can go right to the one you want instead of hitting up arrow a dozen times. The main negative is that the report tools (like dependency report) are hidden away on the Current Folder panel under a tiny triangle rather than being on the main tool ribbon. Lots of other nice features but I don't remember which are different from an old version. Can't say much about the memory since I have 8 GB and soon to have 32 GB RAM and 1.5 TB of solid state drive (hopefully within the month).
More Answers (1)
dpb
on 18 Apr 2014
Presuming do have the Statistics Toolboxen...
>> [a,g,d,f]=textread('bara.csv','%*d%s%s%s%d','headerlines',1,'delimiter',',')
a =
'"Admitted"'
'"Rejected"'
g =
'"Male"'
'"Male"'
d =
'"A"'
'"A"'
f =
512
313
>> an= nominal(a)
an =
"Admitted"
"Rejected"
>> cnt=hist(an)
cnt =
1
1
>>
etc., etc., etc., ...
15 Comments
dpb
on 24 Apr 2014
Well, I've not had time to actually try to learn much about the dataset methods other than just a (very) quick perusal thereof. Oh, bummer!! I went back and realized you said you don't have the Stat Toolbox so unless it's also in some other tooboxen you do have you're stuck with the basic Matlab cell/structure data types it would seem.
That makes going back to the original idea I showed of making your own nominal/ordinal variables by associating the string value with a level via unique then working with those numeric values.
The real pita in Matlab is that, unlike SAS, BMDP, R, the other really "statistics-aware" packages, there is no FREQUENCY variable, even in the Statistics Toolbox. Hence, you'll have to either
a) duplicate the data from the summary table form given to replicate each record type the frequency number of times to use builtin Matlab functions over those (a real waste of memory, of course), or
b) compute the proportions numerically from the frequencies themselves.
Again, it just isn't that convenient to do these kinds of analyses in Matlab unless there are some major tricks that aren't being revealed in the Stats Toolbox doc's...
See Also
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!