Apple AI research examines spatial reasoning, ASL annotation

Apple hasn’t abandoned spatial computing, judging by its research studies.

Apple’s interest in AI models and their applications in spatial computing shows no signs of slowing down, even as some claim the Apple Vision Pro is dead.

In April 2026, it was argued that the Apple Vision Pro was an outright failure and that, as a result, we’d never see a successor product. That rumor, though it always seemed unreasonable, has since come into question.

Even though the company’s Vision Products Group may have seen some changes, there’s ultimately still hope for a new generation of the Apple Vision Pro. Apple’s AI research suggests the company hasn’t abandoned its spatial-related projects.

On the contrary, new studies posted on the Apple Machine Learning blog explore the use of LLMs in sign language annotation, 3D head modeling, and more. Apple’s researchers also developed a new benchmarking system to evaluate the spatial-functional intelligence of LLMs.

Benchmarking spatial-functional intelligence for multimodal LLMs

The paper titled “From Where Things Are to What They’re For: Benchmarking Spatial-Functional Intelligence for Multimodal LLMs” outlines a new testing and grading system for MLLMs.

Collage of household task examples with photos of rooms and appliances, multiple-choice questions, and labeled sections on counting, reasoning, layout inference, functional association, operation planning, and troubleshooting.

Apple’s researchers developed a benchmarking framework that tests the spatial reasoning capabilities of MLLMs. Image Credit: Apple

As the study explains, to mimic human understanding of a space and its objects, AI models rely on two distinct structures. This includes “a spatial

representation that captures object layouts and relational structure, and a functional representation that encodes affordances, purposes, and context-dependent usage.”

In other words, a multi-modal LLM needs to understand the geometry of a particular space, along with the purpose and location of the objects inside it. Apple’s researchers say that existing benchmarking methods, such as VSI-Bench, only test the first aspect, largely ignoring the latter.

To combat this, they developed the Spatial-Functional Intelligence Benchmark, abbreviated as SFI-Bench. It’s described as a video-based benchmark with 1,555 expert-annotated questions derived from 134 indoor video scans.

As for what SFI-Bench tests specifically, the study explains this in a fairly straightforward manner:

“Beyond spatial cognition, SFI-Bench incorporates functional and knowledge-grounded reasoning, probing whether models understand what objects in the scene are for, how they are operated, and how failures can be diagnosed.”

In other words, the benchmark tests if AI models comprehend what an object is, where it’s located, how it’s used, what it’s used for, and how it can be fixed.

Diagram of a living room navigation task: video scan frames, a 3D rendered room map with colored paths and markers, and annotated text explaining questions, reasoning steps, and correct versus incorrect answers

Apple’s AI researchers tested how well LLMs understand the world around them. Image Credit: Apple.

If this sounds familiar, it’s because Google has had tools with this type of spatial awareness since at least 2024. At its i/o conference that same year, Google’s AI model correctly identified an object in front of it as a record player and even suggested how to repair the device.

In practice, SFI-Bench would serve to test similar and more advanced AI models. Some of the tests mentioned include asking an LLM to identify the largest subset of the same brand bottles on a cabinet, asking it to cancel the current program on a washing machine, what a TV remote is used for, and more.

Apple’s researchers tested several open-source and proprietary AI models with their SFI-Bench framework. Unsurprisingly, Google Gemini 3.1 Pro achieved the best overall result, while Gemini-3.1-Flash-Lite placed third. OpenAI’s GPT-5.4-High scored second.

However, the study notes that “Across all models, global conditional counting emerges as a key bottleneck, revealing persistent limitations in compositional and logical reasoning.”

In other words, most current MLLMs “struggle with spatial memory, functional knowledge integration, and linking perception to external knowledge.” Still, the study noted that models with internet access performed better, relative to offline-only models.

As for potential applications within iOS, we could see Apple unveil a version of Siri with both spatial and contextual awareness. This would make sense, given that the company has partnered with Google for Apple Intelligence features.

It remains to be seen if and when that would debut, though, or how well the AI might perform.

Using AI models for sign language annotation

In a separate study, dubbed “Bootstrapping Sign Language Annotations with Sign Language Models,” Apple’s researchers explored how AI could be used to annotate sign language videos.

Diagram comparing text and sign alignment for sign language recognition, with labeled timelines, colored frame-score grids, and stacked neural-network blocks showing multi-scale dilated convolutions, self-attention, and separate one-hand and two-hand branches

Apple’s researchers explored using AI for ASL annotation. Image Credit: Apple

The company’s research team says it developed a “pseudo-annotation pipeline that takes signed video and English as input and outputs a ranked set of likely annotations, including time intervals, for glosses, fingerspelled words, and sign classifiers.”

In doing so, they seek to reduce the time and cost of annotating hundreds of hours of sign language manually. This approach involved creating “simple yet effective baseline fingerspelling and ISR models, achieving state-of-the-art on FSBoard (6.7% CER) and on ASL Citizen datasets (74% top-1 accuracy).”

Apple’s researchers developed nearly 500 manual English-to-glossary annotations. They validated them through back translation, manual annotations, and pseudo-annotations for over 300 hours of ASL STEM Wiki and 7.5 hours of FLEURS-ASL.

For testing, Claude Sonnet 4.5 was given a gloss-to-English variation of a prompt and had to translate it from manual ASL STEM Wiki annotations to the reference English text that signers interpreted.

The study notes that “Errors were predominantly in cases where a sentence does not have any fingerspelling.” While additional work remains to be done, the researchers say their “approach for fingerspelling recognition and isolated sign recognition can be trained with modest GPU resources and could also be used for further iteration on pseudo annotation pipelines.”

As for why Apple is researching this, it could have something to do with the long-rumored camera-equipped AirPods. Perhaps the company plans to expand its Live Translation feature to include sign language.

3D gaussian head Reconstruction from multi-View captures

Another study called “Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures” explores how head models can be made from images with the help of AI.

Flowchart of a neural network reconstructing a woman's 3D head from multiple photos, showing foreground and background ResNet encoders, transformer blocks, Gaussian decoders, and rendered versus groundtruth outputs

Apple’s AI researchers explored how LLMs can be used to create 3D head models from multi-view captures. Image Credit: Apple.

Apple’s researchers developed “HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups.”

In essence, the study explores how different head views can be converted into Gaussian blobs and then into 3D models through a series of encoders and decoders.

To test their image-to-3D-model method, those behind the study used “an internal dataset with more than 10,000 subjects, which is an order of magnitude larger than existing multi-view human head datasets.” The 3D head models were also animated using expression blendshapes.

Overall, the study explains that “HeadsUp achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization.”

In terms of practical applications, the study could be related to the Apple Vision Pro and its Persona feature. Apple may be looking for ways to improve how expressions are rendered, or how faces themselves are captured and rendered within visionOS.

There may also be hardware or comfort-related applications. During the development of the headset, AppleInsider was told that the company included various 3D head types alongside Apple Vision Pro models.

Time will tell what Apple does with the information its researchers create. While we have to wait and see what its next product will be, one thing is for sure: the company isn’t backing down when it comes to AI and spatial computing.

Apple is set to announce iOS 27 and its corresponding OS updates at WWDC 2026, which will begin on June 8.

Source link

Stephan Dorsey

Stephan is the sports journalist for the Maple Grove Report.

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Wacom MovinkPad 11 review: Specs, features,price

Identity security firm SailPoint discloses GitHub repository breach

May 11, 2026

The Quiet Cyber Security Risks Inside Mobile Game Trading Communities

May 11, 2026

Google warns artificial intelligence is accelerating cyberattacks and zero-day exploits

May 11, 2026

How to break the cycle of late payments

Identity security firm SailPoint discloses GitHub repository breach

May 11, 2026

The Quiet Cyber Security Risks Inside Mobile Game Trading Communities

May 11, 2026

Google warns artificial intelligence is accelerating cyberattacks and zero-day exploits

May 11, 2026

This 25-year-old Windows tool is better than Task Manager

Identity security firm SailPoint discloses GitHub repository breach

May 11, 2026

The Quiet Cyber Security Risks Inside Mobile Game Trading Communities

May 11, 2026

Google warns artificial intelligence is accelerating cyberattacks and zero-day exploits

May 11, 2026

Recent Reviews

How to break the cycle of late payments

Payments are at the heart of any accounting and bookkeeping firm. But what happens when your clients don’t pay on time? The cost isn’t just financial. There’s often an emotional toll, a drain on time, and a real barrier to growth.

We surveyed 800 small-to-medium business (SMB) decision-makers across Australia and New Zealand to better understand the state of late payments today, and the findings are powerful.

The GoCardless Pursuing Payments 2025 report uncovers the true impact of late payments and what you can do to break the cycle.

1. The pursuit of payments is still a time drain for many businesses

Over a quarter of small businesses report spending up to an hour every single week just chasing down late payments.

Think about that – a full hour of every work week, gone. That’s an hour that could be spent onboarding new clients, innovating, or simply focusing on what you do best. Instead, it’s lost to the frustrating and awkward task of debt collection.

Unfortunately, the problem isn’t getting any better. Nearly half of SMBs are waiting longer for payments now than they were just 12 months ago (48% in Australia and 51% in New Zealand). And with rising living costs, it’s no surprise that 59% are worried this trend will only get worse.

2. Late payments take a financial and emotional toll

While the time sink is bad enough, the financial and emotional impact can be far-reaching.

41% of Australian SMBs and 35% of New Zealand SMBs report that their payments are, on average, more than 14 days overdue. And these delayed payments inflict a substantial financial hit with 15% of SMBs in both countries losing up to $1,000 every month.

Our research also showed the heavy emotional cost. Chasing money creates tension with customers, causes stress, and makes business owners feel anxious and frustrated. It’s a vicious cycle that can distract from your day-to-day business and core purpose.

3. Bad cash flow is bad for growth

Delayed payments often mean poor cash flow and can result in businesses having to put a hold on future plans. Here are a few growth-stunting actions Australia and New Zealand SMBs have been forced to take due to late payments:

Ending their relationship with the late payer
Increasing the price for their customers

Being late paying their suppliers
Postponing the rollout of a new product or service
Closing their business

4. Late payments don’t have to be inevitable

So, what’s the solution? The good news is that SMBs are hungry for change. Two-thirds of the businesses we surveyed said they’re interested in using new technology to get a handle on late payments.

That’s where technology comes in. By adopting modern methods like bank payments with GoCardless (think, payments that are made from one bank account directly to another, including BECS Direct Debit and PayTo) you can create, schedule and collect payments for your client invoices on their due date – all from your existing Xero setup.

It’s time to put a stop to the endless admin, reduce costly payment failures, and get paid up to 47% faster. Connect GoCardless to Xero to automate invoice payments, and take back control of your business’s cash flow and growth.

Was this article helpful?

YesNo

Source link