Building a Classical LiDAR Perception Pipeline from Scratch: Ground Removal, Clustering, Classification, and Tracking on 128-beam Infrastructure Data

Posted at 2026-04-10

Overview

I built a full perception pipeline on 128-beam infrastructure-mounted LiDAR data — ground removal, BEV clustering, Random Forest classification, and Kalman filter multi-object tracking. All classical methods, no deep learning for detection.

This post shares the key engineering lessons — what broke, why, and how I fixed it. The full code and detailed reports are on GitHub:

Repository: https://github.com/bonsai89/lidar-perception-pipeline

The Pipeline

Raw LiDAR (184k points, 128 beams)
    → Ground Removal (RANSAC calibration + polar grid)
    → BEV Clustering (connected components)
    → Classification (Random Forest, 23 geometric features)
    → Multi-Object Tracking (Kalman filter + Hungarian assignment)
    → Classified objects with persistent IDs

Key Lessons

1. Ground Removal: 6 iterations to get right

The sensor is infrastructure-mounted (fixed on a pole, tilted down). I went through 6 approaches:

Iteration	Method	Why it failed
1	Per-frame RANSAC on full scene	Locked onto bus roof, not road. 6-7s per frame
2	Calibrate once + flat z-threshold	Missed ground at range where road slopes
3	Cartesian grid + local percentile	Bus underside corrupted ground estimate in cells below bus
4	Multi-frame accumulation	Only 1-5% cell coverage across frames
5	Plane equation extrapolation	0.01 residual tilt → 2m drift at 100m range
6	Polar grid + adaptive deviation	Works. 30-80ms per frame

The final solution uses:

One-time calibration on nearby points (within 10m)
Rotation via Rodrigues' formula to make ground horizontal
Polar grid (5m × 5°) matching LiDAR's radial scan pattern
Distance-adaptive threshold: min(0.5 + r * 0.08, 2.0)
np.partition (O(N) quickselect) instead of np.percentile (O(N log N))

Key insight: A bus only covers a narrow angular span in polar coordinates. Adjacent polar wedges still see the road beside it. Cartesian grids don't have this property.

2. Clustering: 4-connectivity vs 8-connectivity

BEV grid projection (0.15m cells) + scipy.ndimage.label for connected components.

With 8-connectivity (diagonals count as connected), a car parked next to a wall shared one diagonal cell. They merged into one cluster, got rejected by the size filter, and the car vanished from detection.

Switching to 4-connectivity fixed it. This one parameter had more impact than any algorithm choice.

Other attempts that failed:

Morphological opening (3×3 erosion erased 2×2 cell pedestrians at range)
Per-cell height filter (car hatchback edges only had 0.1-0.2m z-spread across 2 scan rings — filter punched holes in car outlines)

3. Classification: Where geometric features hit their limit

Random Forest on 23 handcrafted features. Test macro-F1: 0.82.

Feature ablation results:

Feature group	Macro-F1	Key improvement
Bounding box + height (9)	0.731	Baseline
+ PCA scattering (11)	0.777	car→bg: 18.8%→16.4%
+ Vertical layer fractions (19)	0.800	ped→bike: 16.9%→15.0%
+ Local structure features (23)	0.815	nn_dist_std targets car→bg
+ Redundant derived (35)	0.822	Marginal gain

The most important finding: Across all feature sets (19, 23, 35), the confidence gap between correct predictions (0.87) and misclassifications (0.60) was 0.277 ± 0.002. Identical regardless of feature count. This is the Bayes error rate of the geometric representation — more features cannot fix it. A fundamentally different representation (PointNet, temporal context) is needed.

PCA yaw-invariance was discovered accidentally — a car at 45° to sensor axes had xy_area inflated by 2.4x because ground alignment only fixes pitch/roll, not yaw. 2D PCA alignment on the horizontal plane per cluster fixed this.

4. Tracking: Asymmetric lifecycle is critical

Kalman filter (constant velocity, 6-DOF) + Hungarian assignment with Mahalanobis gating (d² > 7.81, chi-squared 95% at 3 DOF).

The three design choices that mattered:

Asymmetric track lifecycle:

Tentative tracks die after 1 miss → kills false alarms immediately
Confirmed tracks survive 3 misses → bridges temporary occlusion
Without this asymmetry, you trade ghost tracks for lost real tracks with no good threshold

Mahalanobis over Euclidean:

New tracks (large covariance) accept wider matches
Established tracks (tight covariance) are strict
Euclidean + fixed gate can't express this

Covariance initialization:

P_pos=1.0 was too uncertain relative to R=0.3 → filter overweighted predictions early
P_vel=5.0 was too confident for unknown velocity at birth
Changed to P_pos=0.5, P_vel=10.0 → faster convergence, less jitter

Performance

Stage	Latency per frame
Ground removal	30-80ms
BEV clustering	290-770ms
Classification (23 features)	500-1200ms
Tracking	< 50ms
Total	~1-2s

Path to real-time: C++ with Eigen for PCA, nanoflann for KDTree, pre-allocated buffers. The Python prototype validates the approach; production implementation is a straightforward port.

Code

Full implementation, ablation studies, confusion matrices, and detailed failure analysis:

Background

I'm a perception engineer — previously at Toyota Technological Institute (camera-LiDAR-radar fusion, semantic segmentation, 5 papers at IEEE IV, ICPR, PSIVT) and TierIV (Autoware perception pipeline, ROS2, multi-sensor calibration). This was my first time working with infrastructure-mounted LiDAR. Coming from vehicle-mounted perception, the differences were bigger than expected — no ego-motion compensation, but completely different occlusion patterns and the ground removal problem changes fundamentally.

Feedback and questions welcome.

Currently open to perception/CV engineering roles in Japan, and consulting opportunities. Feel free to reach out:

GitHub: https://github.com/bonsai89
LinkedIn: https://www.linkedin.com/in/nithilankarunakaran/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up