🤖 Robotic Arm Control using TD3 & Robosuite

Deep Reinforcement Learning meets Robotic Simulation

Overview

This project integrates Robosuite and CARLA environments to control a Panda robotic arm using TD3 (Twin Delayed Deep Deterministic Policy Gradient). It demonstrates real-world reinforcement learning for precise robotic motion through policy optimization.

Objective

  • Implement an end-to-end robot learning pipeline using TD3.
  • Train a simulated robotic arm in Robosuite’s Lift task.
  • Integrate physics-based control and vision feedback.
  • Visualize motion through the MuJoCo viewer or recorded videos.

Use Case

The framework can be extended to real robotic manipulators for:

  • Industrial pick-and-place automation
  • Human-robot collaboration research
  • Dexterous motion learning from simulation
  • Safe reinforcement learning for physical systems

System Architecture

The system connects simulation, reinforcement learning, and visualization:

  1. Robosuite (MuJoCo) — Physics-based robot simulator
  2. TD3 Agent — Policy learning using actor-critic networks
  3. PyTorch Backend — Handles model training and updates
  4. Gym Wrapper — Standardized observation/action interface
  5. CARLA Interface — Optional environment for combined autonomy experiments

Training Environment Setup

import robosuite as suite
from robosuite.wrappers import GymWrapper
from robosuite.controllers import load_controller_config
from td3_torch import Agent
import numpy as np

controller_config = load_controller_config(default_controller="OSC_POSE")
env = suite.make(
    env_name="Lift",
    robots=["Panda"],
    controller_configs=controller_config,
    reward_shaping=True,
    control_freq=40,
)
env = GymWrapper(env)
agent = Agent(alpha=5e-4, beta=5e-4, tau=0.005,
              input_dims=env.observation_space.shape,
              env=env, n_actions=env.action_space.shape[0])

for episode in range(2000):
    obs, _ = env.reset()
    done, score = False, 0
    while not done:
        action = agent.choose_action(obs)
        obs2, reward, terminated, truncated, _ = env.step(action)
        done = terminated or truncated
        agent.remember(obs, action, reward, obs2, done)
        agent.learn()
        obs = obs2
        score += reward
    print(f"Episode {episode+1}: Score = {score:.2f}")
      

TD3 Agent Architecture

class Agent:
    def __init__(self, alpha, beta, tau, input_dims, env, n_actions):
        self.actor = ActorNetwork(...)
        self.critic_1 = CriticNetwork(...)
        self.critic_2 = CriticNetwork(...)
        self.memory = ReplayBuffer(...)
        self.tau = tau

    def choose_action(self, obs):
        state = torch.tensor(obs).float().unsqueeze(0)
        action = self.actor(state)
        return action.detach().numpy()[0]

    def learn(self):
        states, actions, rewards, next_states, dones = self.memory.sample_buffer()
        # TD3 critic and actor update steps...
        self.update_network_parameters()
        

Replay Buffer

class ReplayBuffer:
    def __init__(self, max_size, input_shape, n_actions):
        self.state_memory = np.zeros((max_size, *input_shape))
        self.action_memory = np.zeros((max_size, n_actions))
        ...

    def sample_buffer(self, batch_size):
        batch = np.random.choice(min(self.mem_ctr, self.mem_size), batch_size)
        return (self.state_memory[batch], self.action_memory[batch], ...)
        

Results

Future Scope

← Back to Main Page