How this directory is researched, sourced, and ranked, and, just as importantly, what we deliberately leave blank. The whole point of rl-list.com is that you can audit every claim.
Procurement teams choosing an RL-environment vendor care about scale, velocity, quality, cost, and data specs. Almost none of that is on the public web, it surfaces only in direct vendor engagement. So we don’t pretend to answer it. Instead, for every company we research the public proxies for those questions, cite each one, and tag how confident we are.
| What a buyer wants to know | The public proxy we source |
|---|---|
| How fast can they scale production | Headcount, headcount growth, open roles, funding |
| Quality & rigor of the work | Researchers on the team, their backgrounds, published papers/benchmarks |
| Will they survive / are they credible | Capital raised, investors, customers, founding year |
| Can we clear security review | SOC 2 / ISO certifications |
| Footprint & jurisdiction | HQ + office locations |
Every non-trivial field on every vendor page carries one of these tags, so you never have to guess how solid a number is:
The numbers frontier labs actually request in an RFI, task/sample counts, unique environments, pass@1 and difficulty, capability and complexity splits, data-type breakdowns, harness and data format, and unit/total pricing, are not on the public web and only come from direct engagement. We do not estimate them, and we do not let an AI “reason toward” a plausible figure. Those fields stay blank on purpose. If you see a number on rl-list.com, it has a source.
We rank only the dedicated, pure-play RL-environment vendors. Three groups are deliberately excluded from the ranking and listed separately for reference, because they aren’t like-for-like comparable: data-labeling incumbents moving into environments (Scale AI, Surge AI, Mercor), execution-infrastructure providers (sandbox/compute layers), and open-source projects. Mixing a $1B labeling incumbent into a list of focused environment startups would mislead more than it informs.
Within the ranked set, order is driven by a transparent formula we call the RL List score, the same calculation applied to every vendor, so the baseline order stays auditable. A small number of vendors we have reviewed in depth are placed editorially; everyone else falls where the score puts them. The score combines:
The score is not a product-quality or endorsement rating, it reflects scale, signal, and how verifiable a vendor’s public record is. A company lower down is often simply earlier-stage or harder to verify, not a weaker product.
Every vendor page shows a “last updated” date, and the directory is re-verified on a rolling basis. This snapshot was last updated 2026-06-07.