Clear Filters
Clear Filters

Exporting my trained actor, critic NN agent from MATLAB Reinforcement Environment to TensorFlow

5 views (last 30 days)
I am trying to export my trained actor, critic NN agent from MATLAB Reinforcement Environment to TensorFlow,
env = Nuc_Maint_Env_Proposal_220211_NPIC_MATLAB2022A;
initOpts = rlAgentInitializationOptions();
Obtain observation and action specifications.
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
Create a PPO agent from the environment observation and action specifications. This agent uses default deep neural networks for its actor and critic.
agent = rlPPOAgent(obsInfo,actInfo);
% agent = rlACAgent(actor,critic,agentOpts);
To modify the deep neural networks within a reinforcement learning agent, you must first extract the actor and critic function approximators.
actor = getActor(agent);
critic = getCritic(agent);
Extract the deep neural networks from both the actor and critic function approximators.
actorNet = getModel(actor);
criticNet = getModel(critic);
exportNetworkToTensorFlow(actorNet,"actorNet")
exportNetworkToTensorFlow(criticNet,"criticNet"),
The problem is that, when I import the models in python using tensorflow, after steping into the environment my actor setup consistently outputs the same index position for the maximum probability, even though the values vary the index of the maximum probability stays the same, which leads to the same decision output. This only happens in Python and not in MATLAB. Is there anything wrong with the was I am exporting my trained Neural Network?
Below is the python code for getting the action_log:
# python function to get the state_log and action_log
def eval():
action_log = []
state_log = []
env = Nuc_Maint_Env_Proposal_220211_NPIC_MATLAB2022A()
observation = env.reset()
observation = tf.ragged.constant(observation)
observation = tf.reshape(observation, (1, -1))
done = False
reward = 0
num_episodes = 720
for episode in range(num_episodes):
state = env.reset()
action_logits = model_actorNet(observation)
actionelements = np.array([[0, 0], [1, 0], [2, 0], [0, 1], [1, 1], [2, 1]])
action_log_prob = tf.argmax(action_logits, axis=-1)
action_index = action_log_prob.numpy().item()
action = actionelements[action_index]
observation, reward, done, _ = env.step(action)
reward += reward
action_log.append(action)
state_log.append(observation)
if done:
break
return np.array(state_log), np.array(action_log)
Any help would be great.

Answers (1)

Sanjana
Sanjana on 28 Aug 2023
Hi Mahsa,
I understand that you are facing an issue with using the exported “actor” and “critic” models from MATLAB, in python with TensorFlow.
As per the documentation, the code you provided for exporting the trained “actor” and “critic” models, is correct.
The reason for the “actor” to consistently output the same index position, is because of the use of “tf.argmax” function, which is mostly used in the classification tasks and this causes the “actor” to always choose the action with highest probability.
In the context of reinforcement learning, you can use the “tf.random.categorical” function, which is specifically designed for sampling from a categorical distribution, and it allows the “actor” to randomly explore different actions, even if they might not be the most probable ones.
Please refer to the following link, for further information,
Hope this helps!
Regards,
Sanjana

Products


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!