Unpacking the AI Robot Hype: Part 2 - Lower Marginal-Cost, Mass-Produced Robots, thanks to GenAI.

June 24, 205
(~15 Minutes to Read)
Venture Fellow at
Matter Venture Partners

Today’s robotics deployments suffer from steep integration costs. Simple components like a Sony IMX290 camera module can jump from $17 to $700+ when packaged for robotics applications. With countless markups like this one and challenging custom integration efforts, a $30k robot arm often turns into a $100k-$300k custom workcell by the time it’s deployed.

In Part 2 of this series, we explore how massive investments in Generative AI could reverse this trend, transforming robots from bespoke builds into teachable, mass-produced systems. GenAI enables new hardware & software tradeoffs in precision, reliability, and integration that can significantly reduce total system cost. That said, we’re still far from the levels of reliability needed for wide-scale deployment, especially in critical real-world environments.

Figure 1 - Kuka’s traditional industrial robots being mass produced by Kuka robots in traditional robot workcells in China’s Guangdong Province. While this article isn’t focused on traditional workcells or robots, this factory nonetheless signals the beginning of a new chapter of mass production and cost reduction for robot systems.
Source: Screenshot from
Youtube “Innovative Manufacturing Technologies Illuminate China's Manufacturing Triumph” by Switch TV News.

Introduction

This series unpacks how the audacious goals & promises of AI powered robots are suddenly so compelling and worth $bs of investment, despite barely any revenue to show for it (yet). We’re exploring it in 3 parts:

This new generation of GenAI-powered robots will need to perform a variety of general tasks, so it follows that they will need to look vastly different from the custom designed, task-specific, industrial robot and cobot workcells that exist today. They’ll also have to perform these general tasks at a price point that makes economical sense. This transition is forcing the robotics industry to rethink how robot hardware and software is built and deployed, paving the way for massive reductions in the marginal cost of new robot systems (i.e. the cost to build and deploy robot #1001, after the previous 1000 robots have already been deployed).

Spending billions on GenAI can make robots more advanced and more affordable

The recent push in robotics towards Generative AI demonstrates a commonly seen transformational shift in technological development: large, upfront R&D and capital investments into a technology yield exponentially more complex and capable systems. And, when these advancements are combined with new abstractions & interfaces, these complex systems can then be more easily used in new or existing applications at a significantly lower marginal cost than previously possible.

Digital arithmetic has already gone through a similar transformation. The simplest digital calculators were built using a combination of relays, vacuum tubes, or transistors. Operations were hardwired via designing hardware circuits, so each customer and application needed its own design - the circuits to compute artillery tables were not the same as those used for missile guidance or for telephone switches. 

These were superseded by integrated circuits & microprocessors that could execute assembly code, enabling significantly harder and more complex calculations to be solved 1000s of times faster. More importantly, mathematical operations could be defined in code rather than in software, meaning that the same hardware could be used to run different operations for different customers. The same ICs could perform different math. 

And, with smaller & cheaper transistors came more powerful microprocessors, capable of running a variety of applications & operating systems, consisting of millions of lines of code & calculations. Now we’ve moved to datacenters with 10,000s or 100,000s of servers executing billions of lines of code, capable of running globally distributed applications. Each of these transformational shifts and new product offerings required $ms (and often $bs) in investment to manifest, and it’s no question that today’s computer systems are monumentally more capable than their historical counterparts. But, what we often overlook is that existing tasks and operations also became easier (and thus cheaper) during each shift. Wiring up 100+ transistors was harder than writing a few lines of assembly. Writing a custom assembly app was definitely harder than writing an app in a higher level programming language. Writing a custom app was harder than writing a few formulas in Excel. And, finally, instead of dealing with a spreadsheet, we can now literally ask a website to calculate things for us.

Figure 2 - Two of the most complex & flexible calculators ever built. Regardless of whether or not they use GenAI underneath, both are still simple & easy to use, and enabled by $100bs of investment over decades.
Source:
google.com (Left) and chatgpt.com (Right). Both images cropped for compactness.

Large Language models have gone through a similar evolution. The leap from OpenAI’s GPT2 to GPT3, GPT4, and later ‘o’ models came with the ability to create much more complex sequences of text, images, and voice via AI. But it also means that it has become exponentially easier to use AI. Creating a sentiment analysis tool used to be the work of skilled PhD’s training ML models. Now it’s as simple as asking ChatGPT to summarize a page of Amazon reviews.

Generative AI has the potential to do something similar for robotics:
By investing $Bs to develop large & complex Generative AI robot-centric models, robots can potentially perform more advanced tasks that would have otherwise been impossible to code (e.g. increased capability). Just as importantly, however, by removing much of the hardware development, software development, and model training effort that goes into deploying robots today, these models could also reduce the cost of deploying robots into these existing applications (e.g. increased generalizability).

Reducing HW & Integration Costs: From custom work cells to mass produced workers

Custom 1-Off Development is Expensive

The majority of today’s robot systems undergo a slow and expensive sales, design, assembly, and installation process. This complexity is deep, and significant cost is incurred in every step towards selling & installing a single robot workcell, thus increasing the end customer’s total cost:

  • Systems integrators bid on customer jobs, but they also need to pay for the design work done on lost bids.

  • Hardware integration (& rework) can require unexpected additional engineering time, since custom systems rarely come together correctly on the first try.

  • Subcomponent manufacturers design additional environmental, mechanical, software, and electronic robustness, to reduce failures caused by inexperienced integrators.
    (e.g. A Sony IMX290 camera sensor with module-level packaging costs $17, but that same sensor in a system-integrator-ready format costs $700+). (See Figure 3)

  • Subcomponent vendors pay for onsite technical sales engineers to ensure component integration and workcell deployments go smoothly.

  • General project uncertainty requires additional buffer factors added into quotes.

  • Disassembling & reassembling workcells is error prone and time consuming.

  • Logistics providers need to provide one-off quotes for transporting custom workcells, and in some cases, providing custom boxing/unboxing services.

  • Flying expert integrators to customer facilities takes additional time, etc.

  • Additional rework at a customer site can be exponentially slower than at the system’s integrator’s facility.

… and the list goes on. It’s easy for a robot workcell that includes a $30k robot arm to end up costing 3x-10x that price to the customer. Not surprisingly, Kuka, one of the most well known industrial robot manufacturers, makes just almost as much revenue from its systems integration group (€981m) as it does from selling robot arms (€1,081m). (Kuka 2021 Annual Report, Pg 2). This massive gap in the market has also spurred new software opportunities software to simplify the systems integration process. Tools like Intrinsic’s FlowState aim to simplify custom robot system design and deployment, and parametric design tools like RocketFarm’s MyRobotCloud enable an end user to design, simulate, and validate a instantly custom-designed palletizing workcell in minutes, instead of days or weeks.

Figure 3 - Two cameras built with the Sony IMX290 camera sensor.
Left: $17 module from Guangzhou Sincere Optical (Source: Alibaba)
Right: $700 Industrial machine vision camera from IDS Imaging (Source: Edmund Optics)

Figure 4 - Left: In 2021, Kuka’s revenue from developing and deploying robot systems almost matched their €1B revenue from selling robots. (Kuka 2021 Annual Report, Pg 2 - Cropped for compactness)
Middle: Instantly designed & cloud simulated UR10e-based palletizing workcell, based on customer specifications. (Source: RocketFarm MyRobotCloud).
Right: Intrinsic’s FlowState, simplifying custom-workcell design and deployment (Source: Intrinsic)

Mass Production Reduces Marginal Costs

As we’ve seen in every industry involving complex electro-mechanical systems, mass production has the ability to significantly increase fabrication speed & reliability, while also reducing cost (e.g. automotive, consumer electronics, computers, etc). Even lower volume industries, like aircraft manufacturing, leverage the economies of scale gained from mass production. We’re already beginning to see the beginnings of mass produced hardware for robot arm systems:

  • Formic has pre-designed workcells for piece picking, packing, and palletization

  • Bright Robotics’s BRCs (Bright Robotic Cells) are modular pre-built workcells that can quickly be deployed to customers.

  • Path Robotics has prebuilt welding workcells for heavy industry applications

These mass produced systems (or partial flavors of mass production) all benefit from lower BoM costs, faster lead times, lower design risk, and simpler sales & integration processes, thus making more potential customers ready to purchase robot systems.

Figure 5 - 3 examples of robot workcells moving towards mass producible or rapidly designable robot workcells.
Left: Formic’s Ready-to-Deply Palletizing workcell (Source: Formic.co)
Middle: Bright Robotics’s Closed BRC (Bright Robot Cell), designed for small assembly tasks. (Source: Bright Machines)
Right: Path Robotics’s prebuilt welding workcell that automatically determines weld paths based on customer CAD files (Source: Path Robotics)

However, the category of robots that has seen the biggest shift towards mass production is GenAI powered humanoids:

These numbers are staggeringly high, and quite possibly skewing more towards marketing buzz than funded operational plans. Nonetheless, these are enormous figures in comparison to leading light industrial robot players like Universal Robotics touting accolades like having “ramped up [2023Q4 Production] to 1200 units”: a 4800 annual equivalent. It’s hard to know whether or not humanoids will reach the market dominance that the current hype suggests, but there’s no question that these massive economies of scale will drive down costs in the entire supply chain, hopefully also leading to lower cost mass produced non-humanoids.

Reducing SW Costs: From hardcoded robot control, to teachable robots

Up to now, each new generation of robot system (whose history was described in Part 1 of this series) has added additional layers of software & hardware complexity, effectively making every new generation of robot hardware harder to build and deploy than the previous generation. This generally still made economic sense, as the market opportunity for each new robot application was large enough to warrant the massive custom development (or maybe not, given today’s massive graveyard of failed robot system startups from the early 2020s). Excitingly, Generative AI has the potential to buck this narrative of increasing complexity, and hopefully reduces the amount of custom software development needed to deploy a robot into each new application, making it much more economical to deploy more general-purpose robots.

Future Robot Systems: Generative AI, RFMs, and Teachable Robots

As we’ve seen with large language models (LLMs like OpenAI’s GPTs, Meta’s Llamas, Google’s Gemini, etc), by training much larger AI models with more diverse data and 1000s of times more data, we’re able to develop systems that can generalize to a much wider range of tasks. This broader and deeper understanding enables LLM AI systems to perform domain specific tasks (e.g. review a legal contract, create a new recipe, guide a car repair, etc) with just a small set of demonstrations used for a ‘fine-tuning’ model training step. 

We’re now seeing examples of these same “large model” approaches being applied to the robotics domain, with names like VLA Models (Visual-Language-Action Models), Embodied AI, Physical AI, or RFMs (Robot Foundation Models, which is the term we’ll use for the rest of this article). Some are open and free, like DeepMind’s RT-X, NVidia’s GR00T, OpenVLA, or Physical Intelligence’s π0. Others are closed or proprietary like Figure AI’s Helix, Physical Intelligence’s π0.5, or Google’s Gemini Robotics family of models. By providing a multi-modal foundation model, various training data about the world (e.g. videos, robot motion commands, text, robot camera data, force/touch sensor feedback, etc), the model can learn (or optimistically speaking, ‘reason’) about the spatial relationships & physical properties of real objects in our 3 dimensional world (e.g. gravity, mass, friction, contact forces, acceleration, etc). Training a model on this data from scratch (often called foundation model ‘pre-training’), is not easy, and can require 1000s of GPUs, specialized server network architectures, megawatts of power, petabytes of multimodal training data, and $100ms of capital. But with this up-front investment into more general reasoning abilities, the model can likely be adapted to a new application without much effort. For instance, the model could potentially be fine-tuned with a few 100s of examples to perform a new task on a completely new type of robot (e.g. HPT’s fine tuning of a robot-specific head and stem, or Octo’s robot-specific readout heads). And, in some optimistic cases, a few user demonstrations or a single demonstration (e.g. few-shot or one-shot learning), or even just a simple textual explanation like “Grab me the milk from the fridge” (e.g. zero-shot learning) is enough to complete a task.

This new paradigm can transform today’s existing discriminative AI workflows (mentioned earlier). In a discriminative AI world, teams of engineers & data labelers can spend weeks trying to deliver training data and a discriminative AI model that blindly tries to pattern-match its way into executing narrowly defined skill or capability, without any deeper understanding or contextualization (e.g. a self driving car thinking that a white tractor trailer was sky). Instead, in the Generative AI paradigm, the model’s more fundamental understanding of the world can potentially make it possible for a robot operator to retrain or teach a Generative AI robot in hours, if not minutes.

Figure 6 - Multiple examples of GenAI robots performing general tasks
Left: A general purpose robot, “The Everyday Robot”, developed by Alphabet’s X. It relies on the PaLM-SayCan model to translate verbal instructions into robot action. While not robust enough for commercial deployment, it was able to complete a variety of subtasks that are needed to clean & tidy an entire cafe. Source: IEEE Spectrum, 2021
Middle: Physical Intelligence’s π0 RFM generalist policy, folding a towel. Source: Physical Intelligence
Right: Figure AI’s helix model being used to unpack groceries. Source: Figure AI

But, this all may be more nuanced than it sounds

Robot hardware will always have non-trivial cost

Even if we assume that large GenAI robot models will become exponentially more capable, RFMs will become exponentially more reliable, and RFM training and inference compute hardware will become exponentially faster and cheaper, it’s not necessarily the case for robot hardware. These computational trends are loosely tied to Moore’s Law, but there’s no equivalent law for motors, gearboxes, batteries, mechanical structures, sensors, and everything else that isn’t a computer that goes into a robot. 

If anything, the speed of improvements in these electromechanical areas are diminishing. So, even when RFMs are prevalent and universally capable and approach near-zero marginal cost, the robot hardware itself will continue to add economic friction to deployment.

While this is definitely a pessimistic perspective, it’s also worth noting that there is still room for robot hardware to become cheaper from a variety of factors:

  • Economies of Scale (100x Robots) - As in any industry, mass production can generate efficiencies through an entire supply chain.

  • Lower Repeatability Targets - Since AI-powered robots will have cameras or other sensors to guide their actions, they no longer require the repeatability and accuracy that traditional industrial robots demand. This means that cheaper & lighter materials (e.g. plastic) can now be considered for robot structures, instead of being shunned for not being rigid enough. Lower repeatability transmissions like tendons (e.g. 1x) or low-cost gearboxes also become options.

  • Lower Stiffness Requirements - Not only is stiffness no longer required, but it is actually beneficial. High contact, force sensitive tasks, like moving boxes without crushing them or insertion tasks (e.g. loading a dishwasher) benefit from the squishiness in the robot system (i.e. ‘compliance’ or ‘serial elasticity’ in robot terminology) to avoid accidentally imparting huge impacts or forces onto the environment how traditional industrial robots regularly do. Low stiffness transmissions like belt-drives & magnetic gearing suddenly become compelling options too.

  • Lower Reliability Targets - In a traditional industrial setting, a robot hardware failure could shut down a manufacturing line for days, since a new robot would need to be reinstalled, reprovisioned, and recalibrated. As such, industrial robots are designed to run for decades without failure, thus leading to high hardware costs. If a robot could easily be replaced (e.g. AI powered robots that need no provisioning), or there are redundant or collaborative robot systems (e.g. other robots take over the tasks of the failed robot), then it becomes more palatable to install a lower cost system with lower reliability.

The following article goes deeper into the technical aspects of how this could happen: Re-imagining Robot Arm Design: How An Overlooked Technology can create cheaper, easier to use, and more responsive robots

RFMs may shift development work, instead of reducing it

While RFMs can be incredibly powerful at solving multiple tasks (perception, scene understanding, grasping, etc), there could still be a large amount of engineering work required to build a system that actually does useful work. Something similar happened when moving to discriminative AI. While the 1000s of lines of human-written computer vision code was no longer needed (great!), there was additional engineering beyond the models themselves (e.g. handwritten sanity checks on the model outputs, data pipelines to curate and downselect training data, evaluation tools for validating model performance, APIs to interface the model with the hand-written inputs and outputs that interact with the model). We’ve already seen this happen with LLMs. Even with cutting edge models like OpenAI’s GPT series or Meta’s Llama models, there is still a significant engineering effort to build a good chatbot (e.g. ChatGPT) or code-helper co-pilot on top of these models.

Getting to 99% success rate is hard for any algorithm, and especially AI algorithms

Building an autonomous car that detects 99% of strollers is objectively not roadworthy (i.e. it hits 1 out of every 100 strollers!). And, as we’ve seen from this self-driving world, each jump in reliability from from 99% to 99.9%, to 99.99% etc can take years of development and orders of magnitude more investment, testing, and data. Similarly, for robotics, while the demos are amazingly impressive, we still have a long road towards getting meaningful levels of task reliability. For instance, 99% reliability for grabbing and packing objects from warehouse shelves still means that 1/100 tasks is a failure, which would be totally unacceptable by any warehouse manager. And, looking at some of the work happening with cutting edge models like Gemini Robotics, PI-0, and diffusion policies (which are all amazingly impressive work), they still talk about performance with language like “Fine-tuning Gemini Robotics achieves over 70% success on 7 out of 8 tasks with at most 100 demonstrations” (source). Yes, two of the Gemini Robotics tasks did have a 100% success rate, which is already quite impressive. But, as a broader ecosystem, we nonetheless still have a long way to go.

Figure 7: Success rate for 8 different tasks using Google’s Gemini Robotics model, Physical Intelligence’s pi-0 model, and a diffusion policy model. While these are amazing, best-in-class, research results, it also shows how far we are from generalist models that can perform a wide range of tasks at a high reliability.
Source: "Gemini Robotics: Bringing AI into the Physical World” 2025 (arxiv)

Part 2 - Closing Thoughts

The path to lower marginal-cost, mass produced robots is beginning to look plausible, driven by the same GenAI advances that have transformed how we build software and write emails. With today’s heavy investment into robot foundation models (RFMs) and task-agnostic robot hardware, we have the potential for robot hardware production and deployment to begin to scale similarly to other mass produced hardware systems (e.g. cellphones, automobiles, home appliances, etc).

That said, electromechanical systems don’t follow Moore’s Law. Motors, gearboxes, and structural components still face cost and reliability constraints that compute alone can't compensate for. And even with powerful RFMs, the surrounding infrastructure (e.g. business logic, safety wrappers, deployment tooling, etc) remains essential.

And, in part 3, we’ll explore the last major bottleneck: training data. ChatGPT and other LLMs leverage massive amounts of web-scale data. Data of this scale for robots interacting with the physical world simply doesn’t exist. Finding, curating, and creating this data in an economically feasible way is the biggest bottleneck for seeing GenAI robots succeed.

Next
Next

Unpacking the GenAI Robot Hype: Part 1 - A New Generation of Robot Hardware, powered by GenAI