As enterprises accelerate their use of AI, the importance of secure data sharing has never been greater. In IDC’s recent FutureScape 2026 predictions, it was predicted that by 2028, 60% of enterprises will collaborate on data through private exchanges or data clean rooms.
With Amazon Web Services (AWS) announcing new privacy-enhancing synthetic data generation within AWS Clean Rooms, we are already starting to see that prediction take shape.
We sat down with Lynne Schneider, Research Director for Data Collaboration and Monetization, and Location & Geospatial Intelligence at IDC, to unpack this prediction, explore the impact of AWS’s announcement, and offer guidance for enterprises preparing for the next era of AI-driven data collaboration.
You predicted that by 2028, 60% of enterprises will collaborate on data through private exchanges or clean rooms. What’s driving that shift?
Over the next several years, we anticipate that the majority of global enterprises will be collaborating through some form of private data exchange or data clean room. The reason is simple: the only sustainable advantage in an AI world is data, and novel data combinations.
What frightens people is the idea that their private data might leak or reach people they never intended to share it with. That’s why data collaboration technologies, including private exchanges and clean rooms, will rise from “nice to have” to must-have.
Amazon recently announced privacy-enhancing synthetic dataset generation within AWS Clean Rooms. How does this validate the direction you predicted?
This announcement sits at the nexus of two IDC predictions: growth in data collaboration and growth in synthetic data.
People turn to synthetic data for two reasons:
- To expand small datasets when training models.
- To add privacy protection by creating an equivalent privacy-safe dataset.
AWS’s announcement is focused on that second reason — privacy.
Before secure data collaboration was technologically feasible, people relied on contractual promises to keep shared data private. Now the technology itself enforces privacy. Synthetic data was one way organizations tried to protect sensitive elements (like social security numbers or addresses) to reduce the risk of re-identification.
What AWS has introduced is essentially a second layer of privacy protection. You bring your proprietary data into the clean room, activate the AWS service, and it generates a synthetic dataset. AWS also provides instruments to measure how well that synthetic data meets your privacy requirements before you use it.
How does combining clean rooms with synthetic data expand what enterprises can safely do with AI, especially as we head into the agentic AI era?
It’s really an up leveling when you combine the two.
Clean rooms already support federated training and let both humans and AI agents access and combine data securely. Synthetic data adds another privacy option on top of that. Together, they allow organizations to explore more advanced AI use cases — including generative and agentic AI — without exposing raw sensitive data.
From a trust, governance, and privacy standpoint, what does it mean that enterprises can now generate synthetic datasets inside the clean room rather than relying on external tools?
When people build synthetic data today, we often see “synthetic audiences” — personal data that’s transformed for advertising or marketing applications. We’re also seeing emerging use cases in life sciences and healthcare, where the data is extremely sensitive and sometimes scarce. Synthetic data helps expand those datasets for modeling and experimentation.
The challenge is that synthetic data can go wrong in two ways:
- It may stray too far from the original data and become meaningless, or
- It may stay too close, raising re-identification risks.
Combining synthetic data generation with a clean room solves both issues. The clean room governs access and also controls what analyses can be performed. It provides an extra seal of privacy.
What should enterprises start doing now to prepare for this shift, both in terms of data strategy and AI readiness?
Enterprises should start by identifying what kinds of data they need to make their AI, analytics, or decision intelligence more effective.
For example:
If you’re forecasting demand for a product and weather impacts that demand, you may need to combine:
- A general LLM
- Your enterprise’s historical demand data
- External weather data (public or partner-provided)
- Logistics partner data about fleet availability
Each party may hold sensitive information they don’t want to expose. Clean rooms allow you to combine all those pieces securely.
Is there anything else enterprises should know about the direction this market is heading?
We have some great examples of how enterprises are benefiting from data collaboration in a recent IDC report: From Adoption to Advantage: Experiences of Data Cleanroom Innovators.
There was an initial period when “data clean rooms” were a popular buzzword — the same way “AI” is today. Many organizations wanted to say they were doing it. But once you get past the check-the-box phase, you need to prove the value.
This research highlights 11 different use cases, challenges, outcomes, and guidance on how companies are realizing value through data collaboration technologies.
See more predictions shaping the agentic AI era. Explore the full IDC FutureScape insights.