Mechanize is a small, elite San Francisco vendor (founded April 2025 by ex-Epoch AI researchers Matthew Barnett, Tamay Besiroglu, and Ege Erdil) that builds a small number of robust, high-fidelity RL environments and evals for frontier coding agents, selling to leading AI labs. Its stated long-term mission is the full automation of valuable economic work via simulated 'digital office' environments.
Backers: Nat Friedman, Daniel Gross, Patrick Collison
CodingComputer UsePrivate Codebases
site ↗
AfterQuery is a San Francisco applied-research lab and data platform (YC W25) that supplies frontier AI labs with expert-generated human data (SFT, RL rubrics), agent/RL environments, and computer-use trajectories, drawn from a large network of verified practitioners. It publishes real-task benchmarks such as Terminal-Bench, VADER, FinanceQA, and IDE-Bench, positioning around capturing how domain experts (engineers, financial analysts, lawyers) reason.
Backers: Altos Ventures (lead, Series A), The Raine Group, Y Combinator
CodingComputer UseEnterprise Workflows
site ↗
Deeptune is a New York-based startup building managed reinforcement-learning environments ('training gyms') for computer-use and code, where AI agents practice and are evaluated on realistic digital knowledge-work tasks (simulating tools like Slack and Salesforce). It sells these pre-built environments primarily to frontier AI labs and raised a $43M Series A led by a16z, announced March 2026.
Backers: Andreessen Horowitz (a16z, lead), 776, Abstract Ventures
CodingComputer UsePrivate Codebases
site ↗
Bespoke Labs is an applied AI research lab (Mountain View, CA, founded 2024) focused on data curation and RL-environment curation for training and evaluating agents, known for open datasets and reproducible recipes (OpenThoughts) and open-source tools (Curator, Evalchemy). It pairs a public open-source/open-data presence with commercial custom data and RL-environment delivery.
Pedigree: Co-founder/CEO Maheswaran (Mahesh) Sathiamoorthy, ex-Google DeepMind
Huzzle Labs is the AI division of London-based talent platform Huzzle (founded ~2020 by Ingmar Klein, Parham Rakhshanfar, and Amit Choudhary). It positions itself as a human-intelligence data foundry that builds RL environments (code, tool-use, computer-use, long-horizon enterprise workflows), expert trajectory data, and contextual evaluations for frontier AI labs and regulated European enterprises, leveraging Huzzle's vetted PhD/expert network. It bundles environments, human data, and evals in one stack.
Backers: 10x Founders, Angel Invest, Emerge
CodingComputer UsePrivate Codebases
site ↗
Fleet AI builds high-fidelity reinforcement-learning training environments ('gyms') that replicate enterprise software such as Salesforce and Excel, plus browser/desktop workflows, so frontier AI labs and large enterprises can train and evaluate computer-use agents. It ships a Python SDK, a platform API, and the open-source 'Harbor' agent-evaluation/RL-environment tooling, pairing simulated environments with human supervision.
Backers: Sequoia Capital, Menlo Ventures, SV Angel
Computer UseEnterprise Workflows
site ↗
Datacurve is a YC W24 commercial data vendor that supplies expert-curated frontier coding data, RLHF traces, and repository-wide reinforcement learning environments (with unit-test verifiers) to foundation model labs, sourced via its Shipd bounty platform of vetted software engineers. It also publishes DeepSWE, a long-horizon agentic coding benchmark.
Backers: Chemistry (Mark Goldberg, lead Series A), Y Combinator, Balaji Srinivasan (seed)
Proximal is a San Francisco-based (with a Bangalore presence) research lab for coding data, building high-fidelity, long-horizon reinforcement learning environments grounded in real codebases to train and evaluate frontier coding agents. It emphasizes scalable, software-driven data engines over human contractors, and research into reward-hacking detection and 'fuzzy verifiers' that score code quality beyond functional correctness.
Backers: Scribble Ventures (lead), Angels from OpenAI, Anthropic, Thinking Machines, Google DeepMind, xAI, Meta Superintelligence, Cursor and Cognition (per founders' own statements; not independently verified)
CodingLong-HorizonPrivate Codebases
site ↗
Gray Swan AI is a Pittsburgh-based AI security company spun out of Carnegie Mellon, offering adversarial red-teaming and runtime protection for AI models and agents via three products: Arena (a crowdsourced adversarial red-teaming network of 15,000+ researchers), Shade (automated red-teaming/pressure-testing), and Cygnal (runtime input/output guardrails). It positions itself as a security/evaluation partner to frontier labs and enterprises rather than a general RL-environment vendor.
Backers: Wing Venture Capital (co-lead), Madrona (co-lead), Obvious Ventures
Veris AI sells a high-fidelity simulation platform plus a production runtime that let enterprises train, evaluate, and govern AI agents against mocked enterprise tools before and during production, with support for reinforcement learning / fine-tuning pipelines. It positions itself as the enterprise 'environment layer' that agent builders lack.
Backers: Decibel Ventures (lead), Acrew Capital (lead), The House Fund
Chakra Labs runs Dojo, an open/collaborative reinforcement-learning environment hub for computer-use agents, offering deterministic, frame-accurate clones of production software plus human computer-use trajectory datasets, with native support for the Harbor, Verifiers and Verl RL frameworks. It positions itself as bringing frontier-lab-grade CUA training infrastructure to the broader research community.
Pedigree: Alexander Fung (co-founder), ex-Palantir, Snap/Snapchat, Fin; Compute
Andon Labs is a Y Combinator-backed (W24) startup, formerly Vectorview, building benchmarks and evaluations for AI agents' long-horizon coherence and safety (Vending-Bench, Butter-Bench, Blueprint-Bench) and operating real-world autonomous AI businesses. It is known for high-profile collaborations placing AI-run vending machines/stores in the offices of frontier labs Anthropic (Project Vend) and xAI (Grokbox).
Backers: Y Combinator (W24)
Computer UseLong-Horizon
site ↗
Sepal AI is a YC-backed (S24) San Francisco data-research company that builds high-quality training data, expert-graded evaluation benchmarks, and reinforcement-learning environments for frontier LLMs, drawing on a network of 20k+ domain experts (PhDs, finance, medical, STEM). It was acquired by Mercor in February 2026.
Backers: Y Combinator, Metaplanet Holdings, SID Venture Partners
Enterprise WorkflowsLong-HorizonMath
site ↗
HUD (YC W25, formerly hud.so) is a platform for building reinforcement-learning environments and evaluations for computer-use and browser agents. It lets teams wrap real software/code as agent-callable tools in isolated containers, define tasks and rewards, and run evals/RL at scale via an open-source SDK plus a cloud-hosted gateway. It maintains public benchmarks (OSWorld-Verified contributions, SheetBench-50) and positions frontier AI labs and agent-first startups as its target customers.
Backers: Y Combinator (W25 batch), Exceptional Capital
Computer UseEnterprise Workflows
site ↗
Vals AI is an independent, third-party benchmarking and evaluation platform that scores LLMs and AI applications (copilots, RAG, agents) on rigorous, domain-specific tasks in regulated fields such as legal, finance, healthcare, tax and coding. It publishes public leaderboards (e.g., the Vals Index, Finance Agent benchmark, Vals Legal AI Report) and sells private evaluation infrastructure to labs and enterprise engineering teams.
Pedigree: Co-founder/CEO Rayan Krishnan - ex-Stanford AI master's
Halluminate (YC S25, founded 2024, San Francisco) builds managed reinforcement-learning sandbox environments, simulated applications, and human/annotation data plus evaluation benchmarks (WebBench, BrowserBench, Westworld) to train and test computer-use and browser AI agents. Its 2026 site positioning has narrowed toward 'RL environments for financial services' (investment banking, private equity, consulting).
Backers: Y Combinator (S25), Orange Collective, Antigravity Capital
Computer UseEnterprise Workflows
site ↗
Matrices builds reinforcement-learning training environments for frontier AI labs to train agents that use computers and browsers like humans, described as a 'gamified replica of the internet' where thousands of agents learn via RL. The company frames its mission as 'towards self-driving computers' and says it helps labs train computer-use agents (Operator-class systems). Note: this is the correct browser-native entity (matrices.ai / LinkedIn 'matricesapp'), distinct from the similarly named 'Matrice.ai' computer-vision company and 'Matrix AI Network' blockchain project.
Backers: Index Ventures, AI Grant (Nat Friedman & Daniel Gross), Naval Ravikant
BenchFlow is an early-stage, YC-backed open-source 'environment lab' building evaluation infrastructure and a community Benchmark Hub for AI agents, with products including SkillsBench, ClawsBench (mock workplace environments) and a sandboxed agent runtime. It positions environments as 'the new data' for training and evaluating agents across domains like enterprise workflows, coding, computer use and browser tasks.
Backers: Y Combinator, Pear VC, Construct Capital
CodingComputer UseEnterprise Workflows
site ↗
Collinear AI operates a 'Simulation Lab' (SimLab) that builds sandboxed, stateful RL environments simulating enterprise users, tools (Jira, ServiceNow, Shopify, EMR, airline/hotel systems) and multi-step workflows, producing training-ready trajectories, reward signals and evals for agentic models. It also offers synthetic post-training data and LLM-judge evaluation, positioning itself around 'environment-as-a-service' for enterprise long-horizon agents.
Backers: Engineering Capital, Firestreak Ventures, 112 Capital (11.2 Capital)
CodingComputer UseEnterprise Workflows
site ↗
Refresh (YC X25) builds simulation engines / RL environments with verifiable rewards for coding and computer use, partnering with frontier labs and enterprises to train AI software-engineering and computer-use 'coworker' capabilities across terminal and GUI.
Backers: Y Combinator
CodingComputer UsePrivate Codebases
site ↗
Vmax is a San Francisco reinforcement-learning startup (founded 2025 by three RL/robotics PhDs from UCL and UPenn) that automates the conversion of proprietary data and evals into RL environments for LLM-based agents, targeting long-horizon and coding tasks. Its public research includes unix-ctf, a procedural generator of capture-the-flag tasks for Unix/shell competence.
Backers: Race Capital, South Park Commons
Andromede is an early-stage RL data lab that programmatically generates RL environments, tasks, and verifiers from real-world data for post-training and evaluation of frontier agents, with an emphasis on long-horizon sequential reasoning tasks. As of mid-2026 it is in private beta, working with a small set of partners. It was co-founded by Guillaume Allegre (Founder & President) and Alexandre Sallinen (an EPFL-affiliated researcher who contributed to the Meditron medical-LLM project), and is backed by Unusual Ventures.
Backers: Unusual Ventures
Plato (plato.so, Plato Technologies, Inc.) builds simulated worlds for training and evaluating browser and computer-use agents, recreating real websites/software (e.g. Amazon/Airbnb/Gmail-style replicas) as reinforcement-learning environments with structured APIs for interaction, state tracking and scoring. It also offers a 'Computer Use' capability driving a full Linux desktop, positioning at the intersection of browser interaction and enterprise workflow simulation.
Pedigree: Pranav Putta (Co-founder/CTO), prior MultiOn, Georgia Institute of Te
Computer UseEnterprise Workflows
site ↗
AIChamp builds custom reinforcement-learning environments and 'Virtual Gym' simulations for training and evaluating tool-using AI agents on long-horizon, multi-step enterprise tasks, pairing engineered environments (agents operating in software like Slack, Notion, Linear) with domain experts who design and grade tasks (SFT/RLHF/process supervision). The company emphasizes deep industry authority and expert-sourced data, having pivoted from a remote-talent/hiring marketplace background.
Pedigree: Self-claimed 'alumni of OpenAI and xAI team' (vendor site, unverified)
Enterprise WorkflowsLong-Horizon
site ↗
Habitat Inc is an early-stage commercial vendor (2-10 employees, New York HQ) building reinforcement-learning environments for white-collar / work automation, with stated focus on code and desktop-style (computer use) interaction tasks for training agentic AI models. It appears in third-party listings of RL-environment suppliers serving AI labs. No funding, customer, or certification information is publicly available.
Pedigree: Maxim Enis (co-founder), Williams College '24; prior Ramp association
CodingComputer UseEnterprise Workflows
site ↗
Scale AI is the data-labeling and AI-data incumbent that has extended into RL environments, offering simulated web apps, macOS/Windows-like desktop VMs, and MCP-tool environments (Slack, HubSpot, Linear) with expert-designed objectives, rubrics, and automated verifiers to train and evaluate agents on long-horizon professional workflows. Following Meta's ~$14.3B June 2025 investment (~49% non-voting stake) and founder Alexandr Wang's departure to Meta, several frontier-lab customers (OpenAI, Google, xAI) reportedly scaled back or paused engagement over conflict-of-interest concerns.
Backers: Meta Platforms, Accel, Amazon
CodingComputer UseEnterprise Workflows
site ↗
Modal (Modal Labs) is a New York-based, Python-native serverless cloud purpose-built for AI/ML workloads, providing on-demand GPU/CPU compute, fast-booting sandboxed containers, inference, fine-tuning, and code execution. It is execution infrastructure rather than an RL-environment vendor, but is used to run reinforcement-learning training and large fleets of parallel sandboxed environments for AI labs.
Backers: General Catalyst (Series C co-lead), Redpoint Ventures (Series C co-lead; earlier Series A lead), Lux Capital (earlier round lead)
Mercor is a venture-backed expert-marketplace and AI-training-data company that organizes a network of ~30,000+ domain experts (doctors, lawyers, bankers, engineers) to produce RLHF data, evaluations, and reinforcement-learning environments for frontier AI labs and enterprises. Originally an AI-recruiting platform, it pivoted to human-data/RL services and expanded its RL-environment capability via the February 2026 acquisition of Sepal AI.
Backers: Felicis Ventures (led Series C and Series B), Benchmark, General Catalyst
CodingEnterprise WorkflowsLong-Horizon
site ↗
Surge AI is a bootstrapped, high-revenue human-data and RLHF labeling leader serving frontier AI labs, which has expanded into agentic RL environments via its EnterpriseBench suite (notably the CoreCraft enterprise customer-support simulation) and accompanying published benchmarks. As of June 2026 a reported ~$1B first external raise at a ~$25B valuation was in talks but not confirmed closed.
Pedigree: Founder/CEO Edwin Chen: ex-Google, ex-Facebook, ex-Twitter ML teams; M
Enterprise WorkflowsLong-Horizon
site ↗
Prime Intellect operates an open-source RL stack - the Environments Hub (2,500+ community RL environments), the Verifiers library and prime-rl training framework, plus hosted RL post-training (Lab), evals, inference and on-demand GPU compute. It positions itself as the open alternative to closed big-lab RL tooling and also trains open models (INTELLECT series).
Backers: Founders Fund (led $15M round), Menlo Ventures, Distributed Global (co-led seed)
CodingEnterprise WorkflowsLong-Horizon
site ↗
Daytona provides secure, elastic, programmatic sandboxes ('computers') that AI agents and developers can spin up in under ~90ms to run untrusted AI-generated code in isolated, stateful runtimes. It offers a managed-hosted service plus an open-source self-hostable stack, and is positioned as agent-native execution/runtime infrastructure for code execution, computer use, and RL/eval workloads.
Backers: FirstMark Capital (Series A lead; Matt Turck joined board), Pace Capital, Upfront Ventures (seed lead, Series A participant)
E2B provides open-source, secure cloud sandboxes (built on Firecracker microVMs) for running AI-generated code and AI agents, offered as a hosted API with BYOC/on-prem/self-hosted options. It positions as execution infrastructure for enterprise AI agents and self-claims broad Fortune 100 adoption.
Backers: Insight Partners (Series A lead), Decibel (seed lead), Sunflower Capital
Runloop sells cloud-hosted, isolated micro-VM 'devboxes' plus blueprints, snapshots and benchmark/eval tooling that give AI coding agents a secure execution environment for development, evaluation, and reinforcement/supervised fine-tuning (RFT/SFT) loops. It is execution infrastructure for agent builders and model labs rather than an RL-data/environments vendor itself.
Backers: The General Partnership (lead), Blank Ventures, Exponent Founders Capital
General Reasoning is an AI research lab (operating research hub in London; legal entity General Reasoning, Inc. registered in the US) building open RL environments and infrastructure for training and evaluating agents over long horizons. Its OpenReward platform and Open Reward Standard (ORS) provide an open specification for connecting language models to community-built RL environments, with 330+ environments accessible through one API.
Pedigree: Ross Taylor (co-founder/CEO) - ex-Meta AI/FAIR, research lead on Galac
Cua (trycua, YC X25) is open-source MIT-licensed infrastructure for computer-use agents, providing cloud and self-hosted sandboxes across macOS, Windows, Linux, and Android plus an SDK, a virtualization layer (Lume), and a benchmarking/RL-eval suite (Cua-Bench). It positions itself as the 'Docker for computer-use agents,' giving any agent a cloud desktop.
Backers: Y Combinator (X25 batch)
Good Start Labs is a 2025 Every spin-out that builds game-based environments to generate reinforcement-learning data and evaluate frontier models, using both custom games and partnerships with existing games where player behavior helps train and rank AI. It is known for AI Diplomacy / Diplomacy Arena (multi-agent long-horizon strategy) and LOL Arena (humor preference), and publishes openly on GitHub and Hugging Face.
Backers: General Catalyst, Inovia Capital, Tirta Ventures
Morph (Morph Labs) provides snapshot-based VM compute for AI agents via its Infinibranch / Liquid Metal technology, which can snapshot, branch, and restore entire computational environments in roughly 100-250ms to enable massively parallel, reversible ('Git for compute') agent rollouts, evaluations, and reasoning-time branching. It markets the platform (Morph Cloud) as infrastructure for running and scaling agent/RL verification environments rather than as an RL-environment dataset vendor itself.
Backers: Christian Szegedy (reported seed/angel investor; also Chief Scientist), amount undisclosed
Turing is a large AGI-infrastructure and engineering-services company that supplies frontier AI labs with coding data, human expertise, and RL/evaluation data at scale, drawing on a global vetted developer and domain-expert network. Originally a remote-engineering talent marketplace, it now positions itself around 'AGI advancement' through code and reasoning data, including work over private/real codebases.
HQ: Palo Alto, USA
CodingEnterprise WorkflowsPrivate Codebases
site ↗
No vendors match these filters.