All news
Vendor news5/7/26Industry

Anthropic ships Claude 4.6, what tenant reps actually get.

Long-document accuracy and reliable tool use both crossed a meaningful threshold. Here's the release-notes lens for working brokers.

Unsplash · neural network abstract

Unsplash · neural network abstract

Anthropic shipped Claude 4.6 last week. The general-purpose announcement covered coding, agents, and computer use. We sat with the model for ten days running real CRE workloads against it. Here's the version of the release notes a tenant rep actually cares about.

Long-document accuracy is the headline

The thing leases need that most models historically have not had is sustained accuracy across very long documents. A typical office lease is 60-120 pages once you count the rider and exhibits. The clauses that matter are scattered, base rent in section 3, escalation formula in section 4, but the cap on that escalation buried in exhibit B-2.

Earlier models would crush the first 30 pages and then start getting sloppy. By the time they hit the rider, you'd see field values that referenced the wrong section, or just-plausible-enough hallucinations that didn't trace back to source. 4.6 holds it together across the whole document. We've stopped seeing the "context fade" failure mode in our internal evals.

Tool use is good enough to actually deploy

The other quietly-significant change is tool use. We've been building a Model Context Protocol server that exposes the DealDesk pipeline to Claude, open deals, comps, abstracts, LOIs, so a broker can sit in Claude Desktop and say "draft a counter LOI for the 1500 Market deal" and have it pull the abstract, the market comps, and the firm's house-style template, then write the draft back into the deal record.

On 4.5 this kind of multi-step tool use was reliable maybe 80% of the time. The 20% failure mode was bad, the model would invoke the wrong tool, or invent a field name, or stop midway through. On 4.6 we're seeing reliability in the 95-97% range. That's the difference between "interesting demo" and "actually shipping it to brokers."

What's not different

It's still a language model. It still doesn't know what your firm did last Tuesday unless someone told it. It still has a knowledge cutoff and will confidently get the wrong answer on a question that requires fresh data if you don't wire up the right tools. None of that is solved by 4.6.

If you're shopping AI tools right now, the question to ask vendors is not "what model do you use?", that's table stakes. The question is "what does your evaluation harness measure, and can I see the numbers on my own document corpus?" The vendors who can answer that are the ones who've actually crossed the threshold this release lets you cross.

How we're using it

We rolled 4.6 to 100% of DealDesk's lease abstraction traffic the day it shipped. The LOI generator and space report builder follow this week. If you want to see the difference on a document of your own, the free tier is three abstracts a month, no card. Drop in a lease your team has already abstracted by hand and check the diff. That's the eval we trust the most.

More reading

See it on a real lease.

Free tier is three abstracts a month, no card. Drop in a lease your team has already done by hand and check the diff.