From edge devices to enterprise security, an engineer’s vision for trustworthy machine learning is setting new benchmarks

Mr. Popat’s theories on efficient AI found their ultimate proving ground at Apple, where he has played a critical role in architecting the machine learning infrastructure for some of the world’s most ubiquitous wearable devices. Since joining the company’s AI/ML Group in 2021, Popat has spearheaded the development of novel modeling pipelines that power flagship features like Assistive Touch and Double Tap on the Apple Watch.

These are not standard implementations; they are engineering feats that require complex 3D perception and multimodal signal processing to operate within the extreme power and thermal constraints of a watch battery. By successfully deploying these foundation models to the edge, Popat helped transform subtle human gestures into precise digital commands for millions of users.

“The watch forces you to confront the full stack: sensing, modeling, energy, and user trust,” Popat notes. “If the feature misfires, users lose confidence fast. In my work, reliability isn’t just an optimization—it is the product.”

Securing the models themselves

As AI models migrate from secure data centers to exposed edge devices, a new threat vector has emerged: model theft. Popat identified this vulnerability early, pioneering a novel defense mechanism detailed in his patent filing, “A Method to Prevent Capturing of Models in an Artificial Intelligence-based System”.

The filing describes a breakthrough approach to detecting and disrupting “model extraction” attacks—where adversaries attempt to steal proprietary algorithms by analyzing their outputs—without degrading performance for legitimate users.

This methodology served as a technical foundation for AIShield, a Bosch venture focused on protecting AI systems across their lifecycle. Public materials for AIShield describe tools that analyze deployed models for vulnerabilities and apply defenses against model stealing, data poisoning, and adversarial examples across cloud and edge environments.

“We had to start treating models as IP that could be stolen, not just code that could be copied,” Popat argues. “My goal was to design a system where an adversary can see your inputs and outputs, but can never walk away with your brain.”

Agentic systems inside the enterprise

Inside enterprises, the trust question takes yet another form. Many of the first generative AI deployments treated large language models as chat interfaces, where a single prompt would yield a single answer. For structured, high-stakes workflows, that pattern quickly showed its limits.

“Enterprises don’t want clever chat; they want reliable action,” he has said. “The gap between a fluent answer and a correctly executed task can be enormous.” His response has been to design agentic systems, architectures in which language models plan, call tools, execute actions, and refine their behavior over multiple turns, under explicit policy constraints.

In his work, user inputs are translated into intent representations that drive multi-turn loops of clarification, tool selection, and plan refinement. A retrieval system maps tasks to tools — internal APIs, databases, external services — using explicit schemas and constraints.

A separate policy-aware “LLM-as-judge” layer evaluates proposed actions for correctness, safety, and formatting before execution. “Real-world tasks aren’t single prompts,” he has said. “They’re conversations and workflows. If you want reliability, you need the AI to negotiate plans, check them against rules, and adjust when users correct it.”

Observers in security and governance circles have raised broader concerns about the complexity of such systems. They note that as AI is woven into critical infrastructure, failures or attacks can have cascading effects, and debugging multi-layered systems that mix probabilistic reasoning with deterministic code is inherently challenging. These discussions have helped push organizations toward better logging, monitoring, and explainability so that decisions in complex AI pipelines can be scrutinized after the fact.

Mr. Popat acknowledges this tension but rejects the idea that simplicity is a real option for serious deployments. “The tasks are complex, whether we admit it or not,” he has said. “The real choice is between complexity that is observable and governed, and complexity that is implicit and unaccountable.”

A benchmark written in trust

Forecasts suggest that complexity will only grow. The AI software market is expected to nearly quadruple in size by 2030, while edge AI software is projected to more than quadruple. Additionally, AI in cybersecurity is anticipated to almost quadruple. Together, these trends point toward a world in which models operate across billions of devices and thousands of high-stakes workflows.

For engineers working at that intersection, the central question is not whether AI can be made more capable, but rather whether it can be made reliably so. His own work — from low-resource deployment and model security to agentic control architectures and reviewing roles at venues such as ICCV and Apple’s ML Summit — is framed as part of a broader effort to redefine what counts as production-ready AI.

“Trustworthy AI isn’t something you sprinkle on at the end,” he has reflected. “It’s a design constraint from the first diagram. The benchmark that matters is not a leaderboard number; it’s whether people can depend on these systems, quietly, every day.”

In his view, that benchmark runs from edge devices to enterprise platforms and into the emerging discipline of AI security. “If we get that right,” he said, “AI becomes less of a spectacle and more like infrastructure — invisible, reliable, and built to last.”