Venture Beat

Transformative tech coverage that matters
  • In an industry where model size is often seen as a proxy for intelligence, IBM is charting a different course — one that values efficiency over enormity, and accessibility over abstraction.

    The 114-year-old tech giant's four new Granite 4.0 Nano models, released today, range from just 350 million to 1.5 billion parameters, a fraction of the size of their server-bound cousins from the likes of OpenAI, Anthropic, and Google.

    These models are designed to be highly accessible: the 350M variants can run comfortably on a modern laptop CPU with 8–16GB of RAM, while the 1.5B models typically require a GPU with at least 6–8GB of VRAM for smooth performance — or sufficient system RAM and swap for CPU-only inference. This makes them well-suited for developers building applications on consumer hardware or at the edge, without relying on cloud compute.

    In fact, the smallest ones can even run locally on your own web browser, as Joshua Lochner aka Xenova, creator of Transformer.js and a machine learning engineer at Hugging Face, wrote on the social network X.

    All the Granite 4.0 Nano models are released under the Apache 2.0 license — perfect for use by researchers and enterprise or indie developers, even for commercial usage.

    They are natively compatible with llama.cpp, vLLM, and MLX and are certified under ISO 42001 for responsible AI development — a standard IBM helped pioneer.

    But in this case, small doesn't mean less capable — it might just mean smarter design.

    These compact models are built not for data centers, but for edge devices, laptops, and local inference, where compute is scarce and latency matters.

    And despite their small size, the Nano models are showing benchmark results that rival or even exceed the performance of larger models in the same category.

    The release is a signal that a new AI frontier is rapidly forming — one not dominated by sheer scale, but by strategic scaling.

    What Exactly Did IBM Release?

    The Granite 4.0 Nano family includes four open-source models now available on Hugging Face:

    • Granite-4.0-H-1B (~1.5B parameters) – Hybrid-SSM architecture

    • Granite-4.0-H-350M (~350M parameters) – Hybrid-SSM architecture

    • Granite-4.0-1B – Transformer-based variant, parameter count closer to 2B

    • Granite-4.0-350M – Transformer-based variant

    The H-series models — Granite-4.0-H-1B and H-350M — use a hybrid state space architecture (SSM) that combines efficiency with strong performance, ideal for low-latency edge environments.

    Meanwhile, the standard transformer variants — Granite-4.0-1B and 350M — offer broader compatibility with tools like llama.cpp, designed for use cases where hybrid architecture isn’t yet supported.

    In practice, the transformer 1B model is closer to 2B parameters, but aligns performance-wise with its hybrid sibling, offering developers flexibility based on their runtime constraints.

    “The hybrid variant is a true 1B model. However, the non-hybrid variant is closer to 2B, but we opted to keep the naming aligned to the hybrid variant to make the connection easily visible,” explained Emma, Product Marketing lead for Granite, during a Reddit "Ask Me Anything" (AMA) session on r/LocalLLaMA.

    A Competitive Class of Small Models

    IBM is entering a crowded and rapidly evolving market of small language models (SLMs), competing with offerings like Qwen3, Google's Gemma, LiquidAI’s LFM2, and even Mistral’s dense models in the sub-2B parameter space.

    While OpenAI and Anthropic focus on models that require clusters of GPUs and sophisticated inference optimization, IBM’s Nano family is aimed squarely at developers who want to run performant LLMs on local or constrained hardware.

    In benchmark testing, IBM’s new models consistently top the charts in their class. According to data shared on X by David Cox, VP of AI Models at IBM Research:

    • On IFEval (instruction following), Granite-4.0-H-1B scored 78.5, outperforming Qwen3-1.7B (73.1) and other 1–2B models.

    • On BFCLv3 (function/tool calling), Granite-4.0-1B led with a score of 54.8, the highest in its size class.

    • On safety benchmarks (SALAD and AttaQ), the Granite models scored over 90%, surpassing similarly sized competitors.

    Overall, the Granite-4.0-1B achieved a leading average benchmark score of 68.3% across general knowledge, math, code, and safety domains.

    This performance is especially significant given the hardware constraints these models are designed for.

    They require less memory, run faster on CPUs or mobile devices, and don’t need cloud infrastructure or GPU acceleration to deliver usable results.

    Why Model Size Still Matters — But Not Like It Used To

    In the early wave of LLMs, bigger meant better — more parameters translated to better generalization, deeper reasoning, and richer output.

    But as transformer research matured, it became clear that architecture, training quality, and task-specific tuning could allow smaller models to punch well above their weight class.

    IBM is banking on this evolution. By releasing open, small models that are competitive in real-world tasks, the company is offering an alternative to the monolithic AI APIs that dominate today’s application stack.

    In fact, the Nano models address three increasingly important needs:

    1. Deployment flexibility — they run anywhere, from mobile to microservers.

    2. Inference privacy — users can keep data local with no need to call out to cloud APIs.

    3. Openness and auditability — source code and model weights are publicly available under an open license.

    Community Response and Roadmap Signals

    IBM’s Granite team didn’t just launch the models and walk away — they took to Reddit’s open source community r/LocalLLaMAto engage directly with developers.

    In an AMA-style thread, Emma (Product Marketing, Granite) answered technical questions, addressed concerns about naming conventions, and dropped hints about what’s next.

    Notable confirmations from the thread:

    • A larger Granite 4.0 model is currently in training

    • Reasoning-focused models ("thinking counterparts") are in the pipeline

    • IBM will release fine-tuning recipes and a full training paper soon

    • More tooling and platform compatibility is on the roadmap

    Users responded enthusiastically to the models’ capabilities, especially in instruction-following and structured response tasks. One commenter summed it up:

    “This is big if true for a 1B model — if quality is nice and it gives consistent outputs. Function-calling tasks, multilingual dialog, FIM completions… this could be a real workhorse.”

    Another user remarked:

    “The Granite Tiny is already my go-to for web search in LM Studio — better than some Qwen models. Tempted to give Nano a shot.”

    Background: IBM Granite and the Enterprise AI Race

    IBM’s push into large language models began in earnest in late 2023 with the debut of the Granite foundation model family, starting with models like Granite.13b.instruct and Granite.13b.chat. Released for use within its Watsonx platform, these initial decoder-only models signaled IBM’s ambition to build enterprise-grade AI systems that prioritize transparency, efficiency, and performance. The company open-sourced select Granite code models under the Apache 2.0 license in mid-2024, laying the groundwork for broader adoption and developer experimentation.

    The real inflection point came with Granite 3.0 in October 2024 — a fully open-source suite of general-purpose and domain-specialized models ranging from 1B to 8B parameters. These models emphasized efficiency over brute scale, offering capabilities like longer context windows, instruction tuning, and integrated guardrails. IBM positioned Granite 3.0 as a direct competitor to Meta’s Llama, Alibaba’s Qwen, and Google's Gemma — but with a uniquely enterprise-first lens. Later versions, including Granite 3.1 and Granite 3.2, introduced even more enterprise-friendly innovations: embedded hallucination detection, time-series forecasting, document vision models, and conditional reasoning toggles.

    The Granite 4.0 family, launched in October 2025, represents IBM’s most technically ambitious release yet. It introduces a hybrid architecture that blends transformer and Mamba-2 layers — aiming to combine the contextual precision of attention mechanisms with the memory efficiency of state-space models. This design allows IBM to significantly reduce memory and latency costs for inference, making Granite models viable on smaller hardware while still outperforming peers in instruction-following and function-calling tasks. The launch also includes ISO 42001 certification, cryptographic model signing, and distribution across platforms like Hugging Face, Docker, LM Studio, Ollama, and watsonx.ai.

    Across all iterations, IBM’s focus has been clear: build trustworthy, efficient, and legally unambiguous AI models for enterprise use cases. With a permissive Apache 2.0 license, public benchmarks, and an emphasis on governance, the Granite initiative not only responds to rising concerns over proprietary black-box models but also offers a Western-aligned open alternative to the rapid progress from teams like Alibaba’s Qwen. In doing so, Granite positions IBM as a leading voice in what may be the next phase of open-weight, production-ready AI.

    A Shift Toward Scalable Efficiency

    In the end, IBM’s release of Granite 4.0 Nano models reflects a strategic shift in LLM development: from chasing parameter count records to optimizing usability, openness, and deployment reach.

    By combining competitive performance, responsible development practices, and deep engagement with the open-source community, IBM is positioning Granite as not just a family of models — but a platform for building the next generation of lightweight, trustworthy AI systems.

    For developers and researchers looking for performance without overhead, the Nano release offers a compelling signal: you don’t need 70 billion parameters to build something powerful — just the right ones.

  • Microsoft is launching a significant expansion of its Copilot AI assistant on Tuesday, introducing tools that let employees build applications, automate workflows, and create specialized AI agents using only conversational prompts — no coding required.

    The new capabilities, called App Builder and Workflows, mark Microsoft's most aggressive attempt yet to merge artificial intelligence with software development, enabling the estimated 100 million Microsoft 365 users to create business tools as easily as they currently draft emails or build spreadsheets.

    "We really believe that a main part of an AI-forward employee, not just developers, will be to create agents, workflows and apps," Charles Lamanna, Microsoft's president of business and industry Copilot, said in an interview with VentureBeat. "Part of the job will be to build and create these things."

    The announcement comes as Microsoft deepens its commitment to AI-powered productivity tools while navigating a complex partnership with OpenAI, the creator of the underlying technology that powers Copilot. On the same day, OpenAI completed its restructuring into a for-profit entity, with Microsoft receiving a 27% ownership stake valued at approximately $135 billion.

    How natural language prompts now create fully functional business applications

    The new features transform Copilot from a conversational assistant into what Microsoft envisions as a comprehensive development environment accessible to non-technical workers. Users can now describe an application they need — such as a project tracker with dashboards and task assignments — and Copilot will generate a working app complete with a database backend, user interface, and security controls.

    "If you're right inside of Copilot, you can now have a conversation to build an application complete with a backing database and a security model," Lamanna explained. "You can make edit requests and update requests and change requests so you can tune the app to get exactly the experience you want before you share it with other users."

    The App Builder stores data in Microsoft Lists, the company's lightweight database system, and allows users to share finished applications via a simple link—similar to sharing a document. The Workflows agent, meanwhile, automates routine tasks across Microsoft's ecosystem of products, including Outlook, Teams, SharePoint, and Planner, by converting natural language descriptions into automated processes.

    A third component, a simplified version of Microsoft's Copilot Studio agent-building platform, lets users create specialized AI assistants tailored to specific tasks or knowledge domains, drawing from SharePoint documents, meeting transcripts, emails, and external systems.

    All three capabilities are included in the existing $30-per-month Microsoft 365 Copilot subscription at no additional cost — a pricing decision Lamanna characterized as consistent with Microsoft's historical approach of bundling significant value into its productivity suite.

    "That's what Microsoft always does. We try to do a huge amount of value at a low price," he said. "If you go look at Office, you think about Excel, Word, PowerPoint, Exchange, all that for like eight bucks a month. That's a pretty good deal."

    Why Microsoft's nine-year bet on low-code development is finally paying off

    The new tools represent the culmination of a nine-year effort by Microsoft to democratize software development through its Power Platform — a collection of low-code and no-code development tools that has grown to 56 million monthly active users, according to figures the company disclosed in recent earnings reports.

    Lamanna, who has led the Power Platform initiative since its inception, said the integration into Copilot marks a fundamental shift in how these capabilities reach users. Rather than requiring workers to visit a separate website or learn a specialized interface, the development tools now exist within the same conversational window they already use for AI-assisted tasks.

    "One of the big things that we're excited about is Copilot — that's a tool for literally every office worker," Lamanna said. "Every office worker, just like they research data, they analyze data, they reason over topics, they also will be creating apps, agents and workflows."

    The integration offers significant technical advantages, he argued. Because Copilot already indexes a user's Microsoft 365 content — emails, documents, meetings, and organizational data — it can incorporate that context into the applications and workflows it builds. If a user asks for "an app for Project Spartan," Copilot can draw from existing communications to understand what that project entails and suggest relevant features.

    "If you go to those other tools, they have no idea what the heck Project Spartan is," Lamanna said, referencing competing low-code platforms from companies like Google, Salesforce, and ServiceNow. "But if you do it inside of Copilot and inside of the App Builder, it's able to draw from all that information and context."

    Microsoft claims the apps created through these tools are "full-stack applications" with proper databases secured through the same identity systems used across its enterprise products — distinguishing them from simpler front-end tools offered by competitors. The company also emphasized that its existing governance, security, and data loss prevention policies automatically apply to apps and workflows created through Copilot.

    Where professional developers still matter in an AI-powered workplace

    While Microsoft positions the new capabilities as accessible to all office workers, Lamanna was careful to delineate where professional developers remain essential. His dividing line centers on whether a system interacts with parties outside the organization.

    "Anything that leaves the boundaries of your company warrants developer involvement," he said. "If you want to build an agent and put it on your website, you should have developers involved. Or if you want to build an automation which interfaces directly with your customers, or an app or a website which interfaces directly with your customers, you want professionals involved."

    The reasoning is risk-based: external-facing systems carry greater potential for data breaches, security vulnerabilities, or business errors. "You don't want people getting refunds they shouldn't," Lamanna noted.

    For internal use cases — approval workflows, project tracking, team dashboards — Microsoft believes the new tools can handle the majority of needs without IT department involvement. But the company has built "no cliffs," in Lamanna's terminology, allowing users to migrate simple apps to more sophisticated platforms as needs grow.

    Apps created in the conversational App Builder can be opened in Power Apps, Microsoft's full development environment, where they can be connected to Dataverse, the company's enterprise database, or extended with custom code. Similarly, simple workflows can graduate to the full Power Automate platform, and basic agents can be enhanced in the complete Copilot Studio.

    "We have this mantra called no cliffs," Lamanna said. "If your app gets too complicated for the App Builder, you can always edit and open it in Power Apps. You can jump over to the richer experience, and if you're really sophisticated, you can even go from those experiences into Azure."

    This architecture addresses a problem that has plagued previous generations of easy-to-use development tools: users who outgrow the simplified environment often must rebuild from scratch on professional platforms. "People really do not like easy-to-use development tools if I have to throw everything away and start over," Lamanna said.

    What happens when every employee can build apps without IT approval

    The democratization of software development raises questions about governance, maintenance, and organizational complexity — issues Microsoft has worked to address through administrative controls.

    IT administrators can view all applications, workflows, and agents created within their organization through a centralized inventory in the Microsoft 365 admin center. They can reassign ownership, disable access at the group level, or "promote" particularly useful employee-created apps to officially supported status.

    "We have a bunch of customers who have this approach where it's like, let 1,000 apps bloom, and then the best ones, I go upgrade and make them IT-governed or central," Lamanna said.

    The system also includes provisions for when employees leave. Apps and workflows remain accessible for 60 days, during which managers can claim ownership — similar to how OneDrive files are handled when someone departs.

    Lamanna argued that most employee-created apps don't warrant significant IT oversight. "It's just not worth inspecting an app that John, Susie, and Bob use to do their job," he said. "It should concern itself with the app that ends up being used by 2,000 people, and that will pop up in that dashboard."

    Still, the proliferation of employee-created applications could create challenges. Users have expressed frustration with Microsoft's increasing emphasis on AI features across its products, with some giving the Microsoft 365 mobile app one-star ratings after a recent update prioritized Copilot over traditional file access.

    The tools also arrive as enterprises grapple with "shadow IT" — unsanctioned software and systems that employees adopt without official approval. While Microsoft's governance controls aim to provide visibility, the ease of creating new applications could accelerate the pace at which these systems multiply.

    The ambitious plan to turn 500 million workers into software builders

    Microsoft's ambitions for the technology extend far beyond incremental productivity gains. Lamanna envisions a fundamental transformation of what it means to be an office worker — one where building software becomes as routine as creating spreadsheets.

    "Just like how 20 years ago you put on your resume that you could use pivot tables in Excel, people are going to start saying that they can use App Builder and workflow agents, even if they're just in the finance department or the sales department," he said.

    The numbers he's targeting are staggering. With 56 million people already using Power Platform, Lamanna believes the integration into Copilot could eventually reach 500 million builders. "Early days still, but I think it's certainly encouraging," he said.

    The features are currently available only to customers in Microsoft's Frontier Program — an early access initiative for Microsoft 365 Copilot subscribers. The company has not disclosed how many organizations participate in the program or when the tools will reach general availability.

    The announcement fits within Microsoft's larger strategy of embedding AI capabilities throughout its product portfolio, driven by its partnership with OpenAI. Under the restructured agreement announced Tuesday, Microsoft will have access to OpenAI's technology through 2032, including models that achieve artificial general intelligence (AGI) — though such systems do not yet exist. Microsoft has also begun integrating Copilot into its new companion apps for Windows 11, which provide quick access to contacts, files, and calendar information.

    The aggressive integration of AI features across Microsoft's ecosystem has drawn mixed reactions. While enterprise customers have shown interest in productivity gains, the rapid pace of change and ubiquity of AI prompts have frustrated some users who prefer traditional workflows.

    For Microsoft, however, the calculation is clear: if even a fraction of its user base begins creating applications and automations, it would represent a massive expansion of the effective software development workforce — and further entrench customers in Microsoft's ecosystem. The company is betting that the same natural language interface that made ChatGPT accessible to millions can finally unlock the decades-old promise of empowering everyday workers to build their own tools.

    The App Builder and Workflows agents are available starting today through the Microsoft 365 Copilot Agent Store for Frontier Program participants.

    Whether that future arrives depends not just on the technology's capabilities, but on a more fundamental question: Do millions of office workers actually want to become part-time software developers? Microsoft is about to find out if the answer is yes — or if some jobs are better left to the professionals.

  • GitHub is making a bold bet that enterprises don't need another proprietary coding agent: They need a way to manage all of them.

    At its Universe 2025 conference, the Microsoft-owned developer platform announced Agent HQ. The new architecture transforms GitHub into a unified control plane for managing multiple AI coding agents from competitors including Anthropic, OpenAI, Google, Cognition and xAI. Rather than forcing developers into a single agent experience, the company is positioning itself as the essential orchestration layer beneath them all.

    Agent HQ represents GitHub's attempt to apply its collaboration platform approach to AI agents. Just as the company transformed Git, pull requests and CI/CD into collaborative workflows, it's now trying to do the same with a fragmented AI coding landscape.

    The announcement marks what GitHub calls the transition from "wave one" to "wave two" of AI-assisted development. According to GitHub's Octoverse report, 80% of new developers use Copilot in their first week and AI has helped to lead to a large increase overall in the use of the GitHub platform.

    "Last year, the big announcements for us, and what we were saying as a company, is wave one is done, that was kind of code completion," GitHub's COO Mario Rodriguez told VentureBeat. "We're into this wave two era, [which] is going to be multimodal, it's going to be agentic and it's going to have these new experiences that will feel AI native."

    What is Agent HQ?

    GitHub already updated its GitHub Copilot coding tool for the agentic era with the debut of GitHub Copilot Agent in May.

    Agent HQ transforms GitHub into an open ecosystem that unites multiple AI coding agents on a single platform. Over the coming months, coding agents from Anthropic, OpenAI, Google, Cognition, xAI and others will become available directly within GitHub as part of existing paid GitHub Copilot subscriptions.

    The architecture maintains GitHub's core primitives. Developers still work with Git, pull requests and issues. They still use their preferred compute, whether GitHub Actions or self-hosted runners. What changes is the layer above: agents from multiple vendors can now operate within GitHub's security perimeter, using the same identity controls, branch permissions and audit logging that enterprises already trust for human developers.

    This approach differs fundamentally from standalone tools. When developers use Cursor or grant repository access to Claude, those agents typically receive broad permissions across entire repositories. Agent HQ compartmentalizes access at the branch level and wraps all agent activity in enterprise-grade governance controls.

    Mission Control: One interface for all agents

    At the heart of Agent HQ is Mission Control. It's a unified command center that appears consistently across GitHub's web interface, VS Code, mobile apps and the command line. Through Mission Control, developers can assign work to multiple agents simultaneously. They can track progress and manage permissions, all from a single pane of glass.

    The technical architecture addresses a critical enterprise concern: Security. Unlike standalone agent implementations where users must grant broad repository access, GitHub's Agent HQ implements granular controls at the platform level.

    "Our coding agent has a set of security controls and capabilities that are built natively into the platform, and that's what we're providing to all of these other agents as well," Rodriguez explained. "It runs with a GitHub token that is very locked down to what it can actually do."

    Agents operating through Agent HQ can only commit to designated branches. They run within sandboxed GitHub Actions environments with firewall protections. They operate under strict identity controls. Rodriguez explained that even if an agent goes rogue, the firewall prevents it from accessing external networks or exfiltrating data unless those protections are explicitly disabled.

    Technical differentiation: MCP integration and custom agents

    Beyond managing third-party agents, GitHub is introducing two technical capabilities that set Agent HQ apart from alternative approaches like Cursor's standalone editor or Anthropic's Claude integration.

    Custom agents via AGENTS.md files: Enterprises can now create source-controlled configuration files that define specific rules, tools and guardrails for how Copilot behaves. For example, a company could specify "prefer this logger" or "use table-driven tests for all handlers." This permanently encodes organizational standards without requiring developers to re-prompt every time.

    "Custom agents have an immense amount of product market fit within enterprises, because they could just codify a set of skills that the coordination can do, then standardize on those and get really high quality output," Rodriguez said.

    The AGENTS.md specification allows teams to version control their agent behavior alongside their code. When a developer clones a repository, they automatically inherit the custom agent rules. This solves a persistent problem with AI coding tools: Inconsistent output quality when different team members use different prompting strategies.

    Native Model Context Protocol (MCP) support: VS Code now includes a GitHub MCP Registry. Developers can discover, install and enable MCP servers with a single click. They can then create custom agents that combine these tools with specific system prompts.

    This positions GitHub as the integration point between the emerging MCP ecosystem and actual developer workflows. MCP, introduced by Anthropic but rapidly gaining industry support, is becoming a de facto standard for agent-to-tool communication. By supporting the full specification, GitHub can orchestrate agents that need access to external services without each agent implementing its own integration logic.

    Plan Mode and agentic code review

    GitHub is also shipping new capabilities within VS Code itself. Plan Mode allows developers to collaborate with Copilot on building step-by-step project approaches. The AI asks clarifying questions before any code is written. Once approved, the plan can be executed either locally in VS Code or by cloud-based agents.

    The feature addresses a common failure mode in AI coding: Beginning implementation before requirements are fully understood. By forcing an explicit planning phase, GitHub aims to reduce wasted effort and improve output quality.

    More significantly, GitHub's code review feature is becoming agentic. The new implementation will use GitHub's CodeQL engine, which previously largely focused on security vulnerabilities to identify bugs and maintainability issues. The code review agent will automatically scan agent-generated pull requests before human review. This creates a two-stage quality gate.

    "Our code review agent will be able to make calls into the CodeQL engine to then find a set of bugs," Rodriguez explained. "We're extending the engine and we're going to be able to tap into that engine also to find bugs."

    Enterprise considerations: What to do now

    For enterprises already deploying multiple AI coding tools, Agent HQ offers a path to consolidation without forcing tool elimination.

    GitHub's multi-agent approach provides vendor flexibility and reduces lock-in risk. Organizations can test multiple agents within a unified security perimeter and switch providers without retraining developers. The tradeoff is potentially less optimized experiences compared to specialized tools that tightly integrate UI and agent behavior.

    Rodriguez's recommendation is clear: Begin with custom agents. This allows enterprises to codify organizational standards that agents follow consistently. Once established, organizations can layer in additional third-party agents to expand capabilities.

    "Go and do agent coding, custom agents and start playing with that," he said. "That is a capability available tomorrow, and it allows you to really start shaping your SDLC to be personalized to you, your organization and your people."

  • Building AI for financial software requires a different playbook than consumer AI, and Intuit's latest QuickBooks release provides an example.

    The company has announced Intuit Intelligence, a system that orchestrates specialized AI agents across its QuickBooks platform to handle tasks including sales tax compliance and payroll processing. These new agents augment existing accounting and project management agents (which have also been updated) as well as a unified interface that lets users query data across QuickBooks, third-party systems and uploaded files using natural language.

    The new development follow years of investment and improvement in Intuit's GenOS, allowing the company to build AI capabilities that reduce latency and improve accuracy.

    But the real news isn't what Intuit built — it's how they built it and why their design decisions will make AI more usable. The company's latest AI rollout represents an evolution built on hard-won lessons about what works and what doesn't when deploying AI in financial contexts.

    What the company learned is sobering: Even when its accounting agent improved transaction categorization accuracy by 20 percentage points on average, they still received complaints about errors.

    "The use cases that we're trying to solve for customers include tax and finance; if you make a mistake in this world, you lose trust with customers in buckets and we only get it back in spoonfuls," Joe Preston, Intuit's VP of product and design, told VentureBeat.

    The architecture of trust: Real data queries over generative responses

    Intuit's technical strategy centers on a fundamental design decision. For financial queries and business intelligence, the system queries actual data, rather than generating responses through large language models (LLMs).

    Also critically important: That data isn't all in one place. Intuit's technical implementation allows QuickBooks to ingest data from multiple distinct sources: native Intuit data, OAuth-connected third-party systems like Square for payments and user-uploaded files such as spreadsheets containing vendor pricing lists or marketing campaign data. This creates a unified data layer that AI agents can query reliably.

    "We're actually querying your real data," Preston explained. "That's very different than if you were to just copy, paste out a spreadsheet or a PDF and paste into ChatGPT."

    This architectural choice means that the Intuit Intelligence system functions more as an orchestration layer. It's a natural language interface to structured data operations. When a user asks about projected profitability or wants to run payroll, the system translates the natural language query into database operations against verified financial data.

    This matters because Intuit's internal research has uncovered widespread shadow AI usage. When surveyed, 25% of accountants using QuickBooks admitted they were already copying and pasting data into ChatGPT or Google Gemini for analysis.

    Intuit's approach treats AI as a query translation and orchestration mechanism, not a content generator. This reduces the hallucination risk that has plagued AI deployments in financial contexts.

    Explainability as a design requirement, not an afterthought

    Beyond the technical architecture, Intuit has made explainability a core user experience across its AI agents. This goes beyond simply providing correct answers: It means showing users the reasoning behind automated decisions.

    When Intuit's accounting agent categorizes a transaction, it doesn't just display the result; it shows the reasoning. This isn't marketing copy about explainable AI, it's actual UI displaying data points and logic.

    "It's about closing that trust loop and making sure customers understand the why," Alastair Simpson, Intuit's VP of design, told VentureBeat.

    This becomes particularly critical when you consider Intuit's user research: While half of small businesses describe AI as helpful, nearly a quarter haven't used AI at all. The explanation layer serves both populations: Building confidence for newcomers, while giving experienced users the context to verify accuracy.

    The design also enforces human control at critical decision points. This approach extends beyond the interface. Intuit connects users directly with human experts, embedded in the same workflows, when automation reaches its limits or when users want validation.

    Navigating the transition from forms to conversations

    One of Intuit's more interesting challenges involves managing a fundamental shift in user interfaces. Preston described it as having one foot in the past and one foot in the future.

    "This isn't just Intuit, this is the market as a whole," said Preston. "Today we still have a lot of customers filling out forms and going through tables full of data. We're investing a lot into leaning in and questioning the ways that we do it across our products today, where you're basically just filling out, form after form, or table after table, because we see where the world is headed, which is really a different form of interacting with these products."

    This creates a product design challenge: How do you serve users who are comfortable with traditional interfaces while gradually introducing conversational and agentic capabilities?

    Intuit's approach has been to embed AI agents directly into existing workflows. This means not forcing users to adopt entirely new interaction patterns. The payments agent appears alongside invoicing workflows; the accounting agent enhances the existing reconciliation process rather than replacing it. This incremental approach lets users experience AI benefits without abandoning familiar processes.

    What enterprise AI builders can learn from Intuit's approach

    Intuit's experience deploying AI in financial contexts surfaces several principles that apply broadly to enterprise AI initiatives.

    Architecture matters for trust:In domains where accuracy is critical, consider whether you need content generation or data query translation. Intuit's decision to treat AI as an orchestration and natural language interface layer dramatically reduces hallucination risk and avoids using AI as a generative system.

    Explainability must be designed in, not bolted on:Showing users why the AI made a decision isn't optional when trust is at stake. This requires deliberate UX design. It may constrain model choices.

    User control preserves trust during accuracy improvements:Intuit's accounting agent improved categorization accuracy by 20 percentage points. Yet, maintaining user override capabilities was essential for adoption.

    Transition gradually from familiar interfaces:Don't force users to abandon forms for conversations. Embed AI capabilities into existing workflows first. Let users experience benefits before asking them to change behavior.

    Be honest about what's reactive versus proactive:Current AI agents primarily respond to prompts and automate defined tasks. True proactive intelligence that makes unprompted strategic recommendations remains an evolving capability.

    Address workforce concerns with tooling, not just messaging:If AI is meant to augment rather than replace workers, provide workers with AI tools. Show them how to leverage the technology.

    For enterprises navigating AI adoption, Intuit's journey offers a clear directive. The winning approach prioritizes trustworthiness over capability demonstrations. In domains where mistakes have real consequences, that means investing in accuracy, transparency and human oversight before pursuing conversational sophistication or autonomous action.

    Simpson frames the challenge succinctly: "We didn't want it to be a bolted-on layer. We wanted customers to be in their natural workflow, and have agents doing work for customers, embedded in the workflow."

  • Enterprises looking to sell goods and services online are waiting for the backbone of agentic commerce to be hashed out; but PayPal is hoping its new features will bridge the gap.

    The payments company is launching a discoverability solution that allows enterprises to make its product available on any chat platform, regardless of the model or agent payment protocol. 

    PayPal, which is a participant in Google’s Agent Payments Protocol (AP2), found that it can leverage its relationship with merchants and enterprises to help pave the way for an easier transition into agentic commerce and offer flexibility that will benefit the ecosystem. 

    Michelle Gill, PayPal's GM for small business and financial services, told VentureBeat that AI-powered shopping will continue to grow, so enterprises and brands must begin laying the groundwork early. 

    “We think that merchants who've historically sold through web stores, particularly in the e-commerce space, are really going to need a way to get active on all of these large language models (LLMs),” Gill said. “The challenge is that no one really knows how fast all of this is going to move. We’re trying to help merchants think through how to do all of this as low-touch as possible while using the infrastructure they already have without doing a bazillion integrations.”

    She added that AI shopping would also bring about “a resurgence from consumers trying to ensure their investment is protected.”

    PayPal partnered with website builder Wix, Cymbio, Commerce and Shopware to bring products to chat platforms like Perplexity

    Agent-powered shopping 

    PayPal’s Agentic Commerce Services include two features. The first is Agent Ready, which would allow existing PayPal merchants to accept payments on AI platforms. The second is Shop Sync, which will enable companies’ product data to be discoverable through different AI chat interfaces. It takes a company’s catalog information and plug its inventory and fulfillment data to chat platforms. 

    Gill said the data goes into a central repository where AI models can ingest the information. 

    Right now, companies can access Shop Sync; Agent Ready is coming in 2026. 

    Gill said Agentic Commerce Services is a one-to-many solution that would be helpful right now, as different LLMs scrape different data sources to surface information. 

    Other benefits include:

    • Fast integration with current and future partners;

    • More product discovery over the traditional search, browse and cart experiences;

    • Preserved customer insights and relationships where the brand continues to have control over their records and communications with customers. 

    Right now, the service is only available through Perplexity, but Gill said more platforms will be added soon. 

    Fragmented AI platforms 

    Agentic commerce is still very much in the early stages. AI agents are just beginning to get better at reading a browser. while platforms like ChatGPT, Gemini and Perplexity can now surface products and services based on user queries, people cannot technically buy things from chat (yet).

    There’s a race right now to create a standard to enable agents to transact on behalf of users. Other than Google’s AP2, OpenAI and Stripe have the Agentic Commerce Protocol (ACP), and Visa recently launched its Trusted Agent Protocol

    Beyond enabling a trust layer for agents to transact, enterprises struggle with fragmentation in agentic commerce. Different chat platforms use different models, which also interpret information in slightly different ways. Gill said PayPal learned that when it comes to working with merchants, flexibility is critical. 

    “How do you decide if you're going to spend your time integrating with Google, Microsoft, ChatGPT or Perplexity?" Gill noted. "And each one of them right now has a different protocol, a different catalog, config, a different everything. That is a lot of time to make a bet as to where you should spend your time."