How can I plot mixed data types of a csv file ?

2 views (last 30 days)
I'm working on a .csv file that contains one column of mixed data types , the coulmn looks something like:
,"Admit","Gender","Dept","Freq"
1,"Admitted","Male","A",512
2,"Rejected","Male","A",313
How can I read the data from the file and plot them in a reasonable way to determine appropriate features for a classification problem ?
I've tried to use txt2str , csvread , and textscan functions beside some file format conversion functions as well (xlsread ,[num,txt,raw] ) and non of them worked with my data!!
Any good suggestions would be appreciated .
  1 Comment
dpb
dpb on 17 Apr 2014
If you have the Statistics Toolbox, create a nominal variable array from the non-numeric fields
doc nominal % and friends
If you don't have the toolbox, convert manually. unique could be of interest, perhaps to create the number of categories for each.
After the above, then you can do something with those values.

Sign in to comment.

Accepted Answer

Image Analyst
Image Analyst on 20 Apr 2014
You should use the new readtable() function, available in R2013b and later. It can handle different data types. It goes like
t = readtable(fullFileName);
If you want to separate the values, you can do that:
Admit = t.Admit;
Gender = t.Gender;
and so on.
  3 Comments
Image Analyst
Image Analyst on 20 Apr 2014
I have no problem with the new interface and I like the new version R2014a. I guess I'd gotten used to the ribbon toolbar from a year or so of using Microsoft Office first. I think the new version has several things that are very nice and worth upgrading for. It has the new table data type, which is just fantastic. Plus it has a better way for selecting editor windows when you have lots of them open, like I usually do, and there's too many to show along the top. The command line intellisense is also improved to popup a list of prior commands right there in the command window if you type a letter and hit the up arrow key, so you can go right to the one you want instead of hitting up arrow a dozen times. The main negative is that the report tools (like dependency report) are hidden away on the Current Folder panel under a tiny triangle rather than being on the main tool ribbon. Lots of other nice features but I don't remember which are different from an old version. Can't say much about the memory since I have 8 GB and soon to have 32 GB RAM and 1.5 TB of solid state drive (hopefully within the month).
Bara'a
Bara'a on 21 Apr 2014
Although I'm gonna have to upgrade to the newest version to try what you suggested , but I do believe it would work smoothly .
IA , thank you for your suggestion :)

Sign in to comment.

More Answers (1)

dpb
dpb on 18 Apr 2014
Presuming do have the Statistics Toolboxen...
>> [a,g,d,f]=textread('bara.csv','%*d%s%s%s%d','headerlines',1,'delimiter',',')
a =
'"Admitted"'
'"Rejected"'
g =
'"Male"'
'"Male"'
d =
'"A"'
'"A"'
f =
512
313
>> an= nominal(a)
an =
"Admitted"
"Rejected"
>> cnt=hist(an)
cnt =
1
1
>>
etc., etc., etc., ...
  15 Comments
Bara'a
Bara'a on 24 Apr 2014
Thank you for the hint ... I will try to figure out the rest of the way by my self !! Don't want to be demanding :)
dpb
dpb on 24 Apr 2014
Well, I've not had time to actually try to learn much about the dataset methods other than just a (very) quick perusal thereof. Oh, bummer!! I went back and realized you said you don't have the Stat Toolbox so unless it's also in some other tooboxen you do have you're stuck with the basic Matlab cell/structure data types it would seem.
That makes going back to the original idea I showed of making your own nominal/ordinal variables by associating the string value with a level via unique then working with those numeric values.
The real pita in Matlab is that, unlike SAS, BMDP, R, the other really "statistics-aware" packages, there is no FREQUENCY variable, even in the Statistics Toolbox. Hence, you'll have to either
a) duplicate the data from the summary table form given to replicate each record type the frequency number of times to use builtin Matlab functions over those (a real waste of memory, of course), or
b) compute the proportions numerically from the frequencies themselves.
Again, it just isn't that convenient to do these kinds of analyses in Matlab unless there are some major tricks that aren't being revealed in the Stats Toolbox doc's...

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!