Skip to content
Apps in ChatGPT: Still Early and the Foundation for App Based Agentic AI
OpenAI
At the surface, our testing of Apps in ChatGPT showed how little value these integrations provide today. That said, our testing largely misses the point. What’s most important is that today some of the biggest tech companies, including Google, Microsoft, Apple, Salesforce and Adobe, are enabling ChatGPT to be the front end of some of their app experiences. That’s an impressive endorsement, especially considering many of them have their own LLMs that compete with GPT. This paves the way for OpenAI to make good on its prediction that it will become the platform that runs vertical AI applications.

Key Takeaways

In October, OpenAI launched “Apps in ChatGPT,” which is a shift from an app-first workflow to a chat-first workflow. It allows you to prompt third-party apps like Gmail, Dropbox and Salesforce. The idea is that you can now chat with apps, which in theory should make them easier to use.
We tested 10 of the 76 apps and found that only two were more valuable with the ChatGPT integration. While disappointing, it’s still early, and in the coming year the integration will improve.
Our testing largely misses the bigger picture. OpenAI has built a framework that will eventually allow third party apps to become agentic. These app providers will eventually pay OpenAI to power those experiences.
1

Apps in ChatGPT

OpenAI’s “Apps in ChatGPT,” previously referred to as connectors, are an attempt to make ChatGPT function more like a platform, where third-party apps can be used directly inside the ChatGPT interface.

Launched at OpenAI’s DevDay last October, Altman described this direction as enabling “a new generation of apps that are adaptive, interactive and personalized, that you can chat with.” The demos emphasized that these apps can run inside the chat conversation rather than requiring users to leave ChatGPT. Today, there are  76 apps that can connect.

The core product pitch is both convenience and utility. On the convenience side, you can reduce the time and friction of jumping between ChatGPT and an app. On the utility side, ChatGPT can extract more relevant information from an app because the LLM sits over the app’s data. This means apps become capabilities ChatGPT can call when needed.

Conceptually, this is a shift from an app-first workflow to a chat-first workflow.

2

Testing Use Cases

We tested 10 apps most familiar to us and were largely disappointed. Here are the details:

  1. Dropbox: Score 5 out of 5

Prompt: “Find the latest file where we talked about software stocks.”

Dropbox was the best connection by a wide margin. It found the right file quickly and accurately, not just by name or folder, but by the content inside documents. This was clearly better than native Dropbox search, which is constrained to names, dates, and folder structure. This is the one connection I will continue using because it meaningfully improves discovery and retrieval.

Useful? Yes, best overall
Better than native UI? Yes

Can: find files and folders by topic or name, read and summarize docs, extract passages, compare versions, provide exact paths
Cannot: upload, move, rename, delete, create folders, change sharing or permissions

  1. Airtable: Score 4 out of 5

Prompt: “Where is my sandpaper stored”

Airtable delivered a fast and accurate response. What stood out is that it searched across multiple databases at once. In the native app, you can only search one database at a time, which makes this meaningfully better.

Useful? Yes
Better than native UI? Yes

Can: search across multiple databases simultaneously
Cannot: add or edit database entries

  1. Google Calendar: Score 2 out of 5

Prompt: “Find 30 minutes tomorrow for Gene, Brian, and Andrew to meet”

Google Calendar took several rounds of permissions and setup before it worked. Once connected, it accurately checked availability and identified a time, which makes it useful in a narrow sense. The downside is that it was still worse than simply looking at a calendar visually. The workflow was slower and more cumbersome than the native UI.

Useful? Yes, once configured
Better than native UI? No

Can: read calendars, check availability, identify conflicts, draft invite text
Cannot: create or edit events, send invites

  1. Gmail: Score 2 out of 5

Prompt: “Can you check if I sent back the latest signed document from Gene?”

Gmail search can be useful for confirming whether something happened, but this prompt exposed a weakness. It struggled with context and initially referenced an older task instead of the latest request. It only corrected after being walked through the mistake. Relative to the native experience, it was not better, and Gemini inside Gmail handled the same question faster and with better context awareness.

Useful? Not really for this prompt
Better than native UI? No

Can: search with filters, read and summarize threads, confirm replies, surface attachments, pull dates and senders
Cannot: send, reply, forward, archive, delete, label, move, mark read or unread, download attachments

  1. Mailchimp: Score 1 out of 5

Prompt: “Can you help me format this week’s newsletter?”

Mailchimp behaved more like a brainstorming and copy support tool than a connection that actually does anything inside Mailchimp. It was somewhat useful in the abstract, but not for our workflow or for completing real tasks. I can see scenarios where it helps, especially when starting from scratch, but it did not materially improve how we use Mailchimp today.

Useful? Somewhat, but not for our workflow
Better than native UI? No

Can: outline campaigns, draft and rewrite copy, generate subject lines, adapt messaging by audience
Cannot: send without approval, manage subscribers or lists, change account settings, operate the platform end to end

  1. TripAdvisor: Score 1 out of 5

Prompt: “Create a weekend itinerary for July 3–5, 2026 in Brainerd, MN.”

TripAdvisor worked and produced accurate answers. The main advantage was the ability to ask follow-up questions, which helps planning. Still, it did not feel meaningfully different from standard ChatGPT travel planning, and it was less helpful than native discovery where links and browsing matter.

Useful? Yes, but limited
Better than native UI? No

Can: search hotels by location, filter and sort, surface amenities and reviews
Cannot: book travel, guarantee pricing, replace browsing and link-based discovery

  1. Zillow: Score 0 out of 5

Prompt: “Find me a 2+ bedroom house in South Minneapolis, walkable to the lakes, under $500k.”

Zillow returned accurate results, but it turned a very visual, filter-heavy product into chat. That tradeoff was negative. Home search depends on maps, images, and rapid filtering, and the ChatGPT connection does not replicate those strengths. Even though the answers were accurate, it was less useful than using Zillow directly.

Useful? Technically yes
Better than native UI? No

Can: search using criteria, suggest areas, surface Zestimate-related info
Cannot: replicate map-based browsing, guarantee listing freshness, transact, contact agents, search outside the U.S.

  1. Apple Music: Score 0 out of 5

Prompt: “Suggest bands similar to Angels and Airwaves”

It returned similar results to what you get in the app, but without links to the music. There was no added value.

Useful? No
Better than native UI? No

Don’t waste your time.

  1. Peloton: Score 0 out of 5

Prompts:
“What are the live spin classes today”
“What spin classes were posted yesterday”

For live classes, it reported fewer classes than the app. For newly posted classes, it matched the app but provided no links to actually access them. In both cases, it was strictly worse than using Peloton directly.

Useful? No
Better than native UI? No

Don’t waste your time.

  1. Target: Score 0 out of 5

Prompt: “Find me Hawaiian Punch at a store near me”

It returned a list of nearby stores and then asked me to check availability myself. That defeats the purpose of the integration.

Useful? No
Better than native UI? No

Don’t waste your time.

3

OpenAI is becoming a platform to enable agentic apps.

At the surface, our testing of these apps showed, at its best, how early we are in leveraging AI in an app based world, and at its worst, how little value these integrations provide today.

That said, our testing largely misses the point. What’s most important is that today some of the biggest tech companies, including Google, Microsoft, Apple, Salesforce, and Adobe, are enabling ChatGPT to be the front end of some of their app experiences. That’s an impressive endorsement, especially considering many of them have their own LLMs that compete with GPT.

In total, 63 publishers offer 76 apps that can connect with ChatGPT. Here are 10 of the largest publishers:

Source: chatgpt.com/apps

These A-list companies underscore how much effort is being applied to getting us to agentic apps and pave the way for OpenAI to make good on its prediction that it will become the platform that runs vertical AI applications.

Disclaimer

Back To Top