3 comments

  • bensyverson2 hours ago
    I recently experimented with Apple&#x27;s Foundation Models framework, and I came away impressed at the speed and accuracy of the LLM. You can&#x27;t ask it to build you a web app, but it can reliably translate a written instruction into tool use within your native app. I think there&#x27;s a lot of merit to Apple&#x27;s approach, using specialist tiny models like Ferret-UI Lite, though I don&#x27;t think we&#x27;ll see the full fruits of their labor for another year or two.<p>But it&#x27;s a vision that I can get behind, where basic tasks like transcription, computer use, in-app tool, image understanding, etc, are local, secure and private.
  • brudgers2 days ago
    direct to paper, <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2509.26539" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2509.26539</a>
  • w10-11 hour ago
    I&#x27;m disappointed that they are taking the long way around, with screen shots and visual recognition.<p>Apple GUI&#x27;s have underlying accessibility annotations that if surfaced would make UI manipulation easy for LLM&#x27;s.<p>&quot;Back in the day&quot; - 1990&#x27;s - Apple had Virtual User, basically a lisp derivative that reported UI state as S-expressions (like a web DOM) and allowed scripts to manipulate settings and perform UI actions.<p>With such a curated DOM&#x2F;model and selective UI inputs, they could manage privacy and safety, opening up LLM control to users who would otherwise never trust a machine.<p>I hope they&#x27;re working on that approach and training models for it. It&#x27;s one way they could distinguish the Apple platform as being more controllable, with safety and permissions built into the subsystems instead of giving the LLM full control over UI input.
    • CharlesW38 minutes ago
      &gt; <i>I&#x27;m disappointed that they are taking the long way around, with screen shots and visual recognition.</i><p>This strikes me as more of a universal fallback vs. Apple choosing vision <i>instead</i> of a structured control plane. It nicely complements the layers Apple has been building for years: App Intents, Shortcuts, Spotlight&#x2F;Siri surfaces, etc. Those are essentially curated action graphs with explicit parameters, validation, and user consent, which is much closer to your &quot;DOM with safety rails&quot; ideal.<p>All iOS app developers should now be building &quot;App Intents first&quot;. Vision-based awareness is a nice safely for users of apps whose devs who haven&#x27;t yet realized where this is all obviously going.