Tech888’s Calendar Advent Calendar 2025

Genesis physics simulator (5) Reinforcement Learning

Last updated at 2025-12-04Posted at 2025-12-04

Genesis physics simulator (5) Reinforcement Learning

Clone Genesis from Github and navigate to Genesis.

git clone https://github.com/Genesis-Embodied-AI/Genesis.git
cd Genesis

Install TensorBoard to visualize training logs.
By visualizing learning curves (reward, loss), parameter changes, scalars, histograms, and graphs, you can understand the learning progress during reinforcement learning. It's said to be an essential tool.

pip install tensorboard

Install the RL (reinforcement learning) library rsl-rl-lib created by Legged Robotics (ETH Zurich). It's used not only in Genesis but also in Isaac Lab and Isaac Gym. It's a lightweight and fast library.
It has a high-speed implementation of PPO (Proximal Policy Optimization) and Actor-Critic learning capabilities. It seems to run fast with CUDA support.

pip install rsl-rl-lib==2.2.4

Run a trial reinforcement learning. Specify the batch file with -B. You can try changing 1024 to 4096 or other values.

cd Genesis
python examples/locomotion/go2_train.py -B 1024 --max_iterations 100

The following display repeats as iterations progress. However, even though it was set to 100 iterations, it ended at 99. Total time was 44.56s.

################################################################################
                       Learning iteration 99/100                        

                       Computation: 61363 steps/s (collection: 0.299s, learning 0.101s)
               Value function loss: 0.0029
                    Surrogate loss: -0.0049
             Mean action noise std: 0.92
                 Mean total reward: 7.58
               Mean episode length: 796.85
 Mean episode rew_tracking_lin_vel: 0.5305
 Mean episode rew_tracking_ang_vel: 0.1222
        Mean episode rew_lin_vel_z: -0.0128
      Mean episode rew_base_height: -0.0106
      Mean episode rew_action_rate: -0.0869
Mean episode rew_similar_to_default: -0.1561
--------------------------------------------------------------------------------
                   Total timesteps: 2457600
                    Iteration time: 0.40s
                        Total time: 44.56s
                               ETA: 0.4s

The above was B=1024, but below is the case of B=4096. Total time increased to 76.89 s.

################################################################################              　       Learning iteration 99/100  
                       Computation: 136453 steps/s (collection: 0.421s, learning 0.300s)
               Value function loss: 0.0001
               Surrogate loss: -0.0025
             Mean action noise std: 0.58
                 Mean total reward: 16.83
               Mean episode length: 1001.00
 Mean episode rew_tracking_lin_vel: 0.9318
 Mean episode rew_tracking_ang_vel: 0.1741
        Mean episode rew_lin_vel_z: -0.0152
      Mean episode rew_base_height: -0.0068
      Mean episode rew_action_rate: -0.0821
Mean episode rew_similar_to_default: -0.1565
--------------------------------------------------------------------------------
                   Total timesteps: 9830400
                    Iteration time: 0.72s
                        Total time: 76.89s
                               ETA: 0.8s

Note: I was fumbling around trying to start TensorBoard with the following method, but the learning finished before that.

python -m tensorboard.main --logdir logs --port 6006
# Open http://localhost:6006 to see the learning status (supposedly)

After learning is complete, execute the following.

python examples/locomotion/go2_eval.py

For some reason, I got an error saying Genesis/logs/go2-walking/model_100.pt doesn't exist. When I checked the folder, there was model_99.pt, so I renamed it to model_100.pt. Why it was off by one is unknown. After renaming and re-executing, the quadruped walking was successfully displayed.

Contents of the Training Program go2_train.py

I tried adding comments in order.

rsl-rl-lib version check

try:
    try:
        if metadata.version("rsl-rl"):
            raise ImportError
    except metadata.PackageNotFoundError:
        if metadata.version("rsl-rl-lib") != "2.2.4":
            raise ImportError
except (metadata.PackageNotFoundError, ImportError) as e:
    raise ImportError("Please uninstall 'rsl_rl' and install 'rsl-rl-lib==2.2.4'.") from e

Import PPO runner, Genesis physics simulator, and Go2Env: custom environment (quadruped walking task).

Various imports

from rsl_rl.runners import OnPolicyRunner
import genesis as gs
from go2_env import Go2Env

Define function get_train_cfg that returns PPO training configuration.

def get_train_cfg(exp_name, max_iterations):
    train_cfg_dict = {
        "algorithm": { ... },
        "policy": { ... },
        "runner": { ... },
        ...
    }

algorithm (PPO hyperparameters) parameters are as follows:
clip_param: ε for PPO clipping (0.2)
desired_kl: Monitor KL divergence and adjust learning rate
gamma: Discount rate
lam: GAE-Lambda
learning_rate: 0.001
num_learning_epochs: 5
num_mini_batches: 4
policy (NN structure) settings:
Hidden layers: 512 → 256 → 128
Activation: ELU
Actor and Critic have the same structure
runner settings
Here, the RSL-RL learning manager is configured.

Next, define function get_cfgs() that returns Go2 robot environment configuration.

def get_cfgs():
    env_cfg = {...}
    obs_cfg = {...}
    reward_cfg = {...}
    command_cfg = {...}

env_cfg (environment physics settings)
num_actions = 12 → quadruped (3 joints × 4 legs)
Initial angles (initial posture of hip, thigh, calf)
PD parameters kp = 20, kd = 0.5
Fall detection: Episode ends if roll > 10° or pitch > 10°
Action scale: 0.25
Simulate action latency: simulate_action_latency = True
obs_cfg (observation) settings for scaling velocity and joint data
num_obs = 45
obs_scales = {
"lin_vel": 2.0,
"ang_vel": 0.25,
"dof_pos": 1.0,
"dof_vel": 0.05,
}
reward_cfg (reward) Tracking-type reward + penalties
tracking_lin_vel: Linear velocity tracking
tracking_ang_vel: Angular velocity tracking
lin_vel_z: Vertical velocity penalty
base_height: Large penalty (-50) if base height is low
action_rate: Action change rate penalty
similar_to_default: Keep joint angles close to default
command_cfg (target command) External command settings (target velocity, etc.)
lin_vel_x_range = [0.5, 0.5] → Task to walk at constant velocity (0.5 m/s in x direction).

The following is the main part.

def main():
　　# Set default values: batch size, iteration count, etc.
    parser.add_argument("-e", "--exp_name", default="go2-walking")
    parser.add_argument("-B", "--num_envs", default=4096)
    parser.add_argument("--max_iterations", default=101)

    # Initialize Genesis
    gs.init(logging_level="warning")

    # Create log folder (delete previous logs if they exist and create new logs)
    log_dir = f"logs/{args.exp_name}"
    env_cfg, obs_cfg, reward_cfg, command_cfg = get_cfgs()
    train_cfg = get_train_cfg(args.exp_name, args.max_iterations)
    if os.path.exists(log_dir):
    shutil.rmtree(log_dir)
    os.makedirs(log_dir, exist_ok=True)

    # Save configuration to cfgs.pkl (save all training settings)
    pickle.dump(
        [env_cfg, obs_cfg, reward_cfg, command_cfg, train_cfg],
        open(f"{log_dir}/cfgs.pkl", "wb"),
    )

    # Create Go2 environment
    env = Go2Env(
        num_envs=args.num_envs, env_cfg=env_cfg, obs_cfg=obs_cfg, reward_cfg=reward_cfg, command_cfg=command_cfg
    )
    
    # Create PPO runner (automatically detects and uses GPU. Manages all rollout & learning)
    runner = OnPolicyRunner(env, train_cfg, log_dir, device=gs.device)

    # Start learning (set initial state randomly to prevent learning bias)
    runner.learn(num_learning_iterations=args.max_iterations, init_at_random_ep_len=True)

Writing out parameter explanations would become extremely long, so I'll stop here.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up