Apple is adopting a fresh strategy for training its AI models, one that prioritizes user privacy by avoiding the collection or duplication of content from iPhones or Macs. The company intends to enhance features such as email summaries by leveraging synthetic data—artificially created data that simulates user behavior—and differential privacy, ensuring that personal emails or messages remain undisclosed. For users participating in Apple’s Device Analytics program, the AI models will assess synthetic email-like messages alongside a small amount of real user content stored locally on their devices.
This process identifies the synthetic messages that best align with the user sample, transmitting only aggregate information back to Apple, while keeping actual user data on the device. This innovative approach enables Apple to improve text generation tasks without the need for real user content. Apple’s commitment to differential privacy is evident in its existing features like Genmoji.
In this application, the company anonymously gathers general trends related to popular prompts without associating any specific prompt with individual users or devices. Looking ahead, Apple plans to expand this method to enhance other Apple Intelligence features such as Image Playground, Image Wand, and Writing Tools. To tackle more complex tasks like email summarization, Apple generates thousands of synthetic messages, which are analyzed through numerical representations, or ’embeddings,’ reflecting language, tone, and topic.
Participating user devices compare these embeddings with locally stored samples, sharing only the chosen match without revealing the content itself. By collecting frequently selected synthetic embeddings, Apple aims to refine its training data and improve AI outputs while safeguarding user privacy. This system is currently being rolled out in beta versions of iOS 18.5, iPadOS 18.5, and macOS 15.5.
Apple is thus addressing various challenges in its AI development, highlighting a public commitment to balancing user privacy with enhanced model performance.