Robot-arm inverse-dynamics benchmark and next-run recommendation
A worked technical report showing how FERN is evaluated as a small-data physical decision layer — here on a robot-arm inverse-dynamics task (predicting joint load from motion), using the same structure applied to a robotic joint, actuator or cobot cell.
1. Executive summary
The objective is to test whether a compact FERN model can learn the measured response of a specific machine from a small number of local numerical logs. The focus is not autonomous control. The focus is response-surface modelling, extrapolation review and next physical decision support.
2. Dataset summary
| Field group | Example variables | Purpose |
|---|---|---|
| Target state | target pose, target angle, desired position | Defines intended machine behaviour. |
| Measured state | achieved pose, measured error, correction target | Shows physical mismatch between expected and real behaviour. |
| Operating context | payload, speed, temperature, surface, cycle count | Explains when and why error grows. |
| Telemetry | actuator state, joint current, vibration, retry count | Improves response-surface and risk-zone interpretation. |
3. Baseline comparison
FERN is compared against strong baselines on the same inputs and the same splits. Values are RMSE (root-mean-square error in N·m — lower is better) on a held-out interpolation test and a high-load extrapolation test. The goal is not to assume FERN wins everywhere, but to identify where it wins and whether the win is useful.
| Model | Interpolation RMSE | Extrapolation RMSE | What it tells us |
|---|---|---|---|
| FERN | 0.117 | 0.155 | Compact model; accuracy on par with the Gaussian Process at a fraction of the size. |
| Gaussian Process | 0.113 | 0.160 | Strong small-data baseline; best on interpolation. |
| MLP (64,64) | 0.118 | 0.163 | Basic neural baseline. |
| RSM (degree-2) | 0.119 | 0.171 | Classical response surface; weaker under high load. |
| XGBoost / LightGBM | 0.128 | 0.174 | Strong tabular ML; trails on this small, smooth task. |
Compute cost and model size on the same task (single CPU; GPR trained on a 2,000-row subsample):
| Model | Train time | Inference (600 rows) | Parameters / size |
|---|---|---|---|
| FERN | 1.7 s | 0.7 ms | 306 trainable parameters |
| Gaussian Process | 74 s | 26 ms | 9 kernel hyperparams + 2,000 retained points |
| MLP (64,64) | 0.2 s | 0.3 ms | 4,801 weights |
| RSM (degree-2) | 0.01 s | 0.3 ms | 46 coefficients |
| XGBoost / LightGBM | 0.3 s | 2.0 ms | 500 trees · ~15,500 leaves |
FERN matches the Gaussian Process on accuracy while training ~40× faster, predicting ~35× faster, and shipping as a 306-parameter model. This is the compact, CPU-first profile the FERN Box targets.
4. Response surface and risk zones
The report identifies where measured error is stable, where extrapolation becomes weak and which operating regions should be treated as higher-risk until more runs are collected. The output is advisory: it supports the engineer’s decision and does not directly command the machine.
Left: RMSE by model (FERN highlighted), interpolation vs high-load extrapolation. Right: FERN predicted vs measured joint load on the interpolation test set — points on the dashed line are perfect predictions.
5. ROI assumptions
| ROI driver | Customer input needed | Report output |
|---|---|---|
| Engineer time | Hours per tuning cycle | Estimated saving from fewer setup iterations. |
| Failed runs | Cost per failed bench or validation run | Potential saving from better next-run selection. |
| Downtime | Hourly cost of delayed cell launch or machine stop | Deployment acceleration estimate. |
| Data privacy | Constraints on sending logs to cloud | On-prem / anonymized-data deployment logic. |
6. Practical recommendation
The final page gives one of three outcomes: Go if FERN is competitive and useful; Go with more data if signal exists but the dataset is too narrow; or No-Go if standard baselines are sufficient or the data is not a FERN-fit. In this worked example the outcome is a Go: FERN is competitive on interpolation (2nd of five models) and has the lowest extrapolation error of all, in a compact CPU-friendly model.