Actually, really liked the Apple Intelligence announcement. It must be a very exciting time at Apple as they layer AI on top of the entire OS. A few of the major themes.
Step 1 Multimodal I/O. Enable text/audio/image/video capability, both read and write. These are the native human APIs, so to speak.
Step 2 Agentic. Allow all parts of the OS and apps to inter-operate via "function calling"; kernel process LLM that can schedule and coordinate work across them given user queries.
Step 3 Frictionless. Fully integrate these features in a highly frictionless, fast, "always on", and contextual way. No going around copy pasting information, prompt engineering, or etc. Adapt the UI accordingly.
Step 4 Initiative. Don't perform a task given a prompt, anticipate the prompt, suggest, initiate.
Step 5 Delegation hierarchy. Move as much intelligence as you can on device (Apple Silicon very helpful and well-suited), but allow optional dispatch of work to cloud.
Step 6 Modularity. Allow the OS to access and support an entire and growing ecosystem of LLMs (e.g. ChatGPT announcement).
Step 7 Privacy. <3
We're quickly heading into a world where you can open up your phone and just say stuff. It talks back and it knows you. And it just works. Super exciting and as a user, quite looking forward to it.
https://x.com/karpathy/status/1800242310116262150?s=46
That’s why it’s on the OS-level. For example, for text, it seems to work in any text app that uses the standard text input api, which Apple controls.
User activates the “AI overlay” on the OS, not in the app, OS reads selected text from App and sends text suggestions back.
The App is (possibly) unaware that AI has been used / activated, and has not received any user information.
Of course, if you don’t trust the OS, don’t use this. And I’m 100% speculating here based on what we saw for the macOS demo.