Core ML was never designed for LLMs
Core ML was built for classification and detection models: fixed-size inputs, single-shot inference, .mlmodel bundles optimized for the Neural Engine. Running a large language model through Core ML required workarounds. Tokenization was awkward. Streaming token generation was not first-class. Memory management for multi-billion parameter models was manual. Agent tool calling did not exist.
At WWDC 2026, Apple replaced Core ML with Core AI. Built from scratch for LLMs: async inference, streaming token output, large model memory footprints, and third-party model integration without .mlmodel lock-in. iOS 27 and macOS 27 'Golden Gate' ship with the new framework. For defense AI on Apple Silicon, the hacks needed to run Gemma or Llama are replaced by a supported, optimized runtime with ahead-of-time compilation and Python tools for PyTorch conversion.
Third-party models plug in natively
Core AI supports third-party models without format conversion. Llama, Mistral, Gemma, or any compatible model plugs in directly. The LanguageModel protocol lets applications swap between Apple's on-device model, Claude, Gemini, or a custom model with a single line change.
This aligns with EdgeLance's compute routing architecture. The local tier runs on the Neural Engine via Core AI. The base tier routes to a nearby GPU server. The cloud tier reaches external providers when policy allows. EdgeLance can use Apple's optimized runtime for local inference without being locked to Apple's models. The same mission pack that loads Gemma 2B on a MacBook loads it through Core AI with native memory management and hardware acceleration.
Foundation Models 3 is natively multimodal
Apple Foundation Models 3 ships with two tiers: a 3B Core model that runs entirely on-device, and a 20B Advanced model using mixture-of-experts with 1-4B parameters active per request. Both accept text and image input and integrate with Apple's Vision framework for OCR, barcode scanning, and object recognition.
For a patrol that photographs a document, scans a vehicle plate, and asks a question about the scene: all three inputs process through one model running locally on a MacBook or iPhone. No separate pipelines per modality. EdgeLance already runs multi-model stacks on Apple Silicon. AFM 3 adds a baseline multimodal capability that ships with every Apple device, reducing the minimum viable AI loadout for a tactical node.
MCP goes platform-wide
Model Context Protocol extends across iOS 27 and macOS 27. MCP is the open standard for connecting AI models to external tools and data sources. Any application can expose capabilities that system AI invokes through structured tool calls.
EdgeLance services (mesh status, threat analysis, mission context, evidence queries, fleet state) can be exposed as MCP tools that Apple's system AI calls natively. An operator asking Siri 'what is the current threat picture' could invoke EdgeLance's threat analyzer through MCP without opening the app. MCP is a published protocol. EdgeLance already implements structured tool interfaces.
What this changes for EdgeLance and defense AI
Apple did not build Core AI for the military. They built it for consumer apps and Siri. But the result, local LLM inference, third-party model support, multimodal processing, MCP tool calling, ahead-of-time compilation, is exactly what defense developers have been hacking together with custom MLX code for two years. Now there is a supported framework.
Core AI replaces custom MLX integration with a supported framework. The LanguageModel protocol aligns with existing compute routing. MCP turns EdgeLance services into system-level AI tools. AFM 3 provides baseline multimodal capability on every device. watchOS 27 ships with improved health tracking that feeds EdgeLance's biometric readiness pipeline.
Xcode 27 runs code completion on the local Neural Engine first, routing to cloud only when needed. For defense dev teams in SCIFs, that means AI-assisted development without a cloud connection. The five-thousand-dollar ISR stack just got a better runtime without any hardware changes.