0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Genesis physics simulator (5) Reinforcement Learning

Last updated at Posted at 2025-12-04

Genesis physics simulator (5) Reinforcement Learning

Clone Genesis from Github and navigate to Genesis.

git clone https://github.com/Genesis-Embodied-AI/Genesis.git
cd Genesis

Install TensorBoard to visualize training logs.
By visualizing learning curves (reward, loss), parameter changes, scalars, histograms, and graphs, you can understand the learning progress during reinforcement learning. It's said to be an essential tool.

pip install tensorboard

Install the RL (reinforcement learning) library rsl-rl-lib created by Legged Robotics (ETH Zurich). It's used not only in Genesis but also in Isaac Lab and Isaac Gym. It's a lightweight and fast library.
It has a high-speed implementation of PPO (Proximal Policy Optimization) and Actor-Critic learning capabilities. It seems to run fast with CUDA support.

pip install rsl-rl-lib==2.2.4

Run a trial reinforcement learning. Specify the batch file with -B. You can try changing 1024 to 4096 or other values.

cd Genesis
python examples/locomotion/go2_train.py -B 1024 --max_iterations 100

The following display repeats as iterations progress. However, even though it was set to 100 iterations, it ended at 99. Total time was 44.56s.

################################################################################
                       Learning iteration 99/100                        

                       Computation: 61363 steps/s (collection: 0.299s, learning 0.101s)
               Value function loss: 0.0029
                    Surrogate loss: -0.0049
             Mean action noise std: 0.92
                 Mean total reward: 7.58
               Mean episode length: 796.85
 Mean episode rew_tracking_lin_vel: 0.5305
 Mean episode rew_tracking_ang_vel: 0.1222
        Mean episode rew_lin_vel_z: -0.0128
      Mean episode rew_base_height: -0.0106
      Mean episode rew_action_rate: -0.0869
Mean episode rew_similar_to_default: -0.1561
--------------------------------------------------------------------------------
                   Total timesteps: 2457600
                    Iteration time: 0.40s
                        Total time: 44.56s
                               ETA: 0.4s

The above was B=1024, but below is the case of B=4096. Total time increased to 76.89 s.

################################################################################                      Learning iteration 99/100  
                       Computation: 136453 steps/s (collection: 0.421s, learning 0.300s)
               Value function loss: 0.0001
               Surrogate loss: -0.0025
             Mean action noise std: 0.58
                 Mean total reward: 16.83
               Mean episode length: 1001.00
 Mean episode rew_tracking_lin_vel: 0.9318
 Mean episode rew_tracking_ang_vel: 0.1741
        Mean episode rew_lin_vel_z: -0.0152
      Mean episode rew_base_height: -0.0068
      Mean episode rew_action_rate: -0.0821
Mean episode rew_similar_to_default: -0.1565
--------------------------------------------------------------------------------
                   Total timesteps: 9830400
                    Iteration time: 0.72s
                        Total time: 76.89s
                               ETA: 0.8s

Note: I was fumbling around trying to start TensorBoard with the following method, but the learning finished before that.

python -m tensorboard.main --logdir logs --port 6006
# Open http://localhost:6006 to see the learning status (supposedly)

After learning is complete, execute the following.

python examples/locomotion/go2_eval.py

For some reason, I got an error saying Genesis/logs/go2-walking/model_100.pt doesn't exist. When I checked the folder, there was model_99.pt, so I renamed it to model_100.pt. Why it was off by one is unknown. After renaming and re-executing, the quadruped walking was successfully displayed.

Contents of the Training Program go2_train.py

I tried adding comments in order.

rsl-rl-lib version check
try:
    try:
        if metadata.version("rsl-rl"):
            raise ImportError
    except metadata.PackageNotFoundError:
        if metadata.version("rsl-rl-lib") != "2.2.4":
            raise ImportError
except (metadata.PackageNotFoundError, ImportError) as e:
    raise ImportError("Please uninstall 'rsl_rl' and install 'rsl-rl-lib==2.2.4'.") from e

Import PPO runner, Genesis physics simulator, and Go2Env: custom environment (quadruped walking task).

Various imports
from rsl_rl.runners import OnPolicyRunner
import genesis as gs
from go2_env import Go2Env

Define function get_train_cfg that returns PPO training configuration.

def get_train_cfg(exp_name, max_iterations):
    train_cfg_dict = {
        "algorithm": { ... },
        "policy": { ... },
        "runner": { ... },
        ...
    }
  1. algorithm (PPO hyperparameters) parameters are as follows:
    clip_param: ε for PPO clipping (0.2)
    desired_kl: Monitor KL divergence and adjust learning rate
    gamma: Discount rate
    lam: GAE-Lambda
    learning_rate: 0.001
    num_learning_epochs: 5
    num_mini_batches: 4
  2. policy (NN structure) settings:
    Hidden layers: 512 → 256 → 128
    Activation: ELU
    Actor and Critic have the same structure
  3. runner settings
    Here, the RSL-RL learning manager is configured.

Next, define function get_cfgs() that returns Go2 robot environment configuration.

def get_cfgs():
    env_cfg = {...}
    obs_cfg = {...}
    reward_cfg = {...}
    command_cfg = {...}
  1. env_cfg (environment physics settings)
    num_actions = 12 → quadruped (3 joints × 4 legs)
    Initial angles (initial posture of hip, thigh, calf)
    PD parameters kp = 20, kd = 0.5
    Fall detection: Episode ends if roll > 10° or pitch > 10°
    Action scale: 0.25
    Simulate action latency: simulate_action_latency = True

  2. obs_cfg (observation) settings for scaling velocity and joint data
    num_obs = 45
    obs_scales = {
    "lin_vel": 2.0,
    "ang_vel": 0.25,
    "dof_pos": 1.0,
    "dof_vel": 0.05,
    }

  3. reward_cfg (reward) Tracking-type reward + penalties
    tracking_lin_vel: Linear velocity tracking
    tracking_ang_vel: Angular velocity tracking
    lin_vel_z: Vertical velocity penalty
    base_height: Large penalty (-50) if base height is low
    action_rate: Action change rate penalty
    similar_to_default: Keep joint angles close to default

  4. command_cfg (target command) External command settings (target velocity, etc.)
    lin_vel_x_range = [0.5, 0.5] → Task to walk at constant velocity (0.5 m/s in x direction).

The following is the main part.

def main():
  # Set default values: batch size, iteration count, etc.
    parser.add_argument("-e", "--exp_name", default="go2-walking")
    parser.add_argument("-B", "--num_envs", default=4096)
    parser.add_argument("--max_iterations", default=101)

    # Initialize Genesis
    gs.init(logging_level="warning")

    # Create log folder (delete previous logs if they exist and create new logs)
    log_dir = f"logs/{args.exp_name}"
    env_cfg, obs_cfg, reward_cfg, command_cfg = get_cfgs()
    train_cfg = get_train_cfg(args.exp_name, args.max_iterations)
    if os.path.exists(log_dir):
    shutil.rmtree(log_dir)
    os.makedirs(log_dir, exist_ok=True)

    # Save configuration to cfgs.pkl (save all training settings)
    pickle.dump(
        [env_cfg, obs_cfg, reward_cfg, command_cfg, train_cfg],
        open(f"{log_dir}/cfgs.pkl", "wb"),
    )

    # Create Go2 environment
    env = Go2Env(
        num_envs=args.num_envs, env_cfg=env_cfg, obs_cfg=obs_cfg, reward_cfg=reward_cfg, command_cfg=command_cfg
    )
    
    # Create PPO runner (automatically detects and uses GPU. Manages all rollout & learning)
    runner = OnPolicyRunner(env, train_cfg, log_dir, device=gs.device)

    # Start learning (set initial state randomly to prevent learning bias)
    runner.learn(num_learning_iterations=args.max_iterations, init_at_random_ep_len=True)

Writing out parameter explanations would become extremely long, so I'll stop here.

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?