Robotic Arm Control with TD3 and Robosuite

Overview

This project integrates Robosuite and CARLA environments to control a Panda robotic arm using TD3 (Twin Delayed Deep Deterministic Policy Gradient). It demonstrates real-world reinforcement learning for precise robotic motion through policy optimization.

Objective

Implement an end-to-end robot learning pipeline using TD3.
Train a simulated robotic arm in Robosuite’s Lift task.
Integrate physics-based control and vision feedback.
Visualize motion through the MuJoCo viewer or recorded videos.

Use Case

The framework can be extended to real robotic manipulators for:

Industrial pick-and-place automation
Human-robot collaboration research
Dexterous motion learning from simulation
Safe reinforcement learning for physical systems

System Architecture

The system connects simulation, reinforcement learning, and visualization:

Robosuite (MuJoCo) — Physics-based robot simulator
TD3 Agent — Policy learning using actor-critic networks
PyTorch Backend — Handles model training and updates
Gym Wrapper — Standardized observation/action interface
CARLA Interface — Optional environment for combined autonomy experiments

Training Environment Setup

import robosuite as suite
from robosuite.wrappers import GymWrapper
from robosuite.controllers import load_controller_config
from td3_torch import Agent
import numpy as np

controller_config = load_controller_config(default_controller="OSC_POSE")
env = suite.make(
    env_name="Lift",
    robots=["Panda"],
    controller_configs=controller_config,
    reward_shaping=True,
    control_freq=40,
)
env = GymWrapper(env)
agent = Agent(alpha=5e-4, beta=5e-4, tau=0.005,
              input_dims=env.observation_space.shape,
              env=env, n_actions=env.action_space.shape[0])

for episode in range(2000):
    obs, _ = env.reset()
    done, score = False, 0
    while not done:
        action = agent.choose_action(obs)
        obs2, reward, terminated, truncated, _ = env.step(action)
        done = terminated or truncated
        agent.remember(obs, action, reward, obs2, done)
        agent.learn()
        obs = obs2
        score += reward
    print(f"Episode {episode+1}: Score = {score:.2f}")

TD3 Agent Architecture

class Agent:
    def __init__(self, alpha, beta, tau, input_dims, env, n_actions):
        self.actor = ActorNetwork(...)
        self.critic_1 = CriticNetwork(...)
        self.critic_2 = CriticNetwork(...)
        self.memory = ReplayBuffer(...)
        self.tau = tau

    def choose_action(self, obs):
        state = torch.tensor(obs).float().unsqueeze(0)
        action = self.actor(state)
        return action.detach().numpy()[0]

    def learn(self):
        states, actions, rewards, next_states, dones = self.memory.sample_buffer()
        # TD3 critic and actor update steps...
        self.update_network_parameters()

Replay Buffer

class ReplayBuffer:
    def __init__(self, max_size, input_shape, n_actions):
        self.state_memory = np.zeros((max_size, *input_shape))
        self.action_memory = np.zeros((max_size, n_actions))
        ...

    def sample_buffer(self, batch_size):
        batch = np.random.choice(min(self.mem_ctr, self.mem_size), batch_size)
        return (self.state_memory[batch], self.action_memory[batch], ...)

Results

Robot successfully learned to grasp and lift objects in Robosuite’s Lift environment.
TD3 achieved stable convergence with ~95% success rate.
Offscreen rendering produced clean visual rollouts using imageio.

Future Scope

Transfer learning to real robotic hardware via sim-to-real adaptation.
Hybrid vision-policy networks combining depth and proprioception.
Integration with CARLA for multi-agent robot-car coordination.
Deployable cloud pipeline for distributed RL training.

← Back to Main Page