I want to avoid dynamically naming my tables after a join.
Show older comments
I certianly have a lot to learn, but I do know I am not supposed to dynamically create variable names- and I assume that applies to table names as well.
My question is twofold:
- How do I get out of the situation I am in- the real tables are large and very complicated with all kinds of issues and my process is actually working.
- I definitelty would also like to know what I can do better next time to approach this so I don't end up where I am now :)
My current situation has me creating new tables for export to another application based on an initial filter step. I am going to do many iterations of the filter, then use the filtered tables to join with other tables and on and on, and I'd like the final tables and exported files to be named somehow that I and other humans can easily recognize them (ie descriptive terms like 'Steve' and 'Jim'.)
My code is something like this:
T = table(["AAA"; "BBB"; "BBB"; "CCC"; "CCC"; "CCC"; "DDD"], ...
[datetime('2023-07-07 10:55:19'); datetime('2023-07-28 09:31:00'); datetime('2023-07-28 14:02:11');
datetime('2023-01-23 17:35:24'); datetime('2023-01-23 17:31:48'); datetime('2023-02-12 03:10:11');...
datetime('2023-03-01 02:00:11')], ...
[9999; 12; 34; 1111; 2222; 3333; 5556], ...
'VariableNames', {'var1', 'var2', 'var3'});
OtherTable = table(["AAA"; "BBB"; "BBB"; "CCC"; "CCC"; "CCC"], ...
[datetime('2023-07-07 10:55:19'); datetime('2023-07-28 09:31:00'); datetime('2023-07-28 14:02:11');
datetime('2023-01-23 17:35:24'); datetime('2023-01-23 17:31:48'); datetime('2023-02-12 03:10:11')], ...
[9999; 12; 34; 1111; 2222; 3333], ...
'VariableNames', {'var1', 'var2', 'var3'});
%% now I want to group and filter the data
onlyA=T(T.var1=='AAA',:)
BandC=T(T.var1=='BBB' | T.var1=='CCC',:)
CinFeb23=T(T.var1=='CCC' & T.var2=='2023-02-12 03:10:11',:)
and I have lots of these groupings already written out and they don't seem to lend themselves to automation.
Then I go on to a join and export (lots more data wrangling steps of course):
%% This section needs to loop (or something) over all of the above filtered/grouped tables
% (onlyA, BandC, CinFeb23) etc. - I need to findd and replace onlyA
% essentially.
onlyA_joined=innerjoin(onlyA,OtherTable,"LeftKeys",["var1"],"RightKeys",["var1"]);
onlyA_joined.Properties.VariableNames = {'onlyA_name','onlyA_date','onlyA_value',...
'onlyA_date_again','onlyA_value_again'};
onlyA_joined
writetable(onlyA_joined,'final_file.xlsx','sheet','onlyA');
% now I want to do it all over again, except have 'BandC', and then 'CinFeb23' replace all the
% places where 'onlyA' is used as a name.
Lots more steps involved, but I'd like to somehow not have to go in and manually change 'onlyA' every single time as I have a lot of iterations.
My guess is I want to loop and use a cell of strings with the names etc. but not sure how to handle the differewnt cases such as table name, variable name, sheet name etc.
Thansk for any insight!
6 Comments
As per the usual caveat, it would be far simpler for somebody to provide specific code solutions if had some actual data to work with rather than just descriptions; it makes for being able to actually attack the real problem. While the original files may, indeed, be large, the sample datasets don't have to be large at all other than that they are representative of the issues involved.
One comment on the code and remark...
typeA=T(T.var1='somerthing specific',:);
typeB=T(T.var1='somethign different' | T.var2='something else',:);
typeC=T(T.var1='somethign even different' & T.var4='something new again',:);
"I have lots of these already written out and they don't seem to lend themselves to automation."
Well, yes and no...selection of groups by one or a number of criteria can at least be made into generic code where what changes is the data. That can include the variables used in the comparison as well as in the values against which to compare. That alone can make a large difference in code compactness and maintainability--instead of having a zillion specific lines and sequentially-named variables, you've got a function or two to write/debug and then just data sets to create and iterate over.
Then, depending upon where you're headed, you might avoid ever creating the above new tables explicitly but only use the logical addressing vectors from them to select the subsets of interest dynamically and consume them in the output phase before going on to the next.
That kind of code structure is where need more details in order to be able to recommend better solutions.
As for the last about the variable names within a table -- those can be handled dynamically, just not by manual creation but by using some logic and patterns in generating them programmatically. Again, you may need to create some data arrays or you may be able to create/use base names from external input files, but there are clean ways to use dynamic names in addressing variables in tables and/or structs -- what you do NOT want to do is to create base variable names in the workspace-"there be dragons".
Marcus Glover
on 18 Oct 2023
Edited: Marcus Glover
on 18 Oct 2023
dpb
on 18 Oct 2023
". I'm also unfortunately gettng the data from a third party team so my variable names and structures are at the whims of the code they use..."
I would say you then need to start by developing and enforcing a set of standards on your contractors.
Marcus Glover
on 18 Oct 2023
Edited: Marcus Glover
on 18 Oct 2023
dpb
on 18 Oct 2023
If your task is important to the organization, then it should be able to be escalated to the level needed...I guess the risk in that is them deciding it isn't that important, after all! (In a prior life I was known for being the one who raised issues nobody knew existed before :) )
Some comments to your two specific questions:
- Use arrays. MATLAB is designed around arrays, so rather than storing lots of separate variables you use arrays and indexing. When faced with any challenge, always as yourself "how can I do this with an array?". E.g. use cell arrays and structure arrays to store other arrays.
- Understand that meta-data is data, and data belongs in variables (not in variable names). Once you start forcing meta-data into variable names (e.g. "onlyA", "BandC") then you have already started down a dead-end road to a place you really don't want to be at: there be dragons!
Follow these very simple principles and you will find that your code becomes more generalizable and expandable: i.e. rather than painfully copy-and-pasting-and-modifying code (having copies of code is a sign something is wrong) you use e.g. loops... and then (much as Voss showed) processing more than just three filter criteria does not require more copy-and-pasting but simply expanding one single array to add some new criteria (which the rest of your code automatically processes).
Accepted Answer
More Answers (0)
Categories
Find more on Scripts in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!