← Back to News
June 7, 2026 | Research

Nobody Fully Understands a 1,260-Page Congressional Bill.
Not Human, Not Even AI. That Is About to Change.

975,394 tokens cross-referenced by Atlas

We scanned the entire National Defense Authorization Act for Fiscal Year 2026. All 1,260 pages. 975,394 tokens. 3.86 million characters.

Not a summary. Not the first 50 pages. The whole bill, end to end, with every section cross-referenced against every other section.

1,260
Pages scanned
975,394
Tokens processed
336
Sections analyzed
14B
Model parameters

Why no AI model can do this in a single pass

The biggest models on the planet advertise up to 1M token context windows. This bill is just under 1M tokens. So in theory, a top-tier model could fit it. In practice, it cannot do what we did.

Context windows have a dirty secret called "lost in the middle." Models pay strong attention to the beginning of their context and the end. Everything in between degrades. Researchers have documented this extensively. A provision on page 30 and a contradicting clause on page 1,200 are so far apart in the context that attention mechanisms fail to connect them reliably. You can fit a million tokens in the window and still miss the interactions between them.

Fitting the text is not the same as understanding it.

Why chunking fails

The standard workaround is chunking -- break the document into pieces, analyze each piece separately. Every AI document tool does this. The problem is obvious: once you chunk, each piece becomes an island. The model processes chunk 1, forgets it, moves to chunk 2.

A provision buried on page 47 that quietly guts a protection established on page 1,147? Those chunks never meet. No model connects them. The very structure that makes a 1,260-page bill effective at hiding things is the same structure that defeats every chunking approach.

Why RAG does not solve this

Retrieval-Augmented Generation embeds chunks into vectors and runs similarity search to pull "relevant" context. It is probabilistic -- it guesses which chunks might be related based on how similar they look in embedding space.

Legal language is adversarial to this approach. "Section 1043(b)(2)" and the paragraph it amends share zero semantic similarity in vector space. They use completely different terminology. They describe different things. The only connection between them is a section number reference -- and embedding similarity will never find it. RAG will not retrieve one when processing the other.

How Atlas solves this

Atlas is not RAG. It is a deterministic retrieval engine.

Every chunk of the bill gets ingested into Atlas as it is parsed. When chunk 280 references "Section 1043," Atlas performs an exact lookup and returns the actual text of Section 1043 from wherever it lives in the document. No embedding similarity. No probabilistic guessing. The referenced text is retrieved and handed to the model alongside the current chunk. Every time. With zero degradation regardless of how far apart the sections are in the original document.

The model doing the analysis is Qwen2.5-14B running locally on consumer hardware with a 64K context window. It sees roughly 6.5% of the bill at a time. But it does not need to see more. Atlas provides the cross-references on demand. The model analyzes what is in front of it. Atlas tells it what is connected to it.

Together, a 14B model on consumer hardware covers a million tokens with cross-referencing that even the most expensive models with the largest context windows cannot match -- because retrieval beats attention at scale.

Why the NDAA is 1,260 pages

The NDAA is not long because defense policy is complicated. It is long because length is the strategy.

Provisions are buried. Funding is authorized in one section and the constraints on that funding appear 800 pages later. Exemptions reference subsections of amendments to previous years' acts. Corporate subsidies are wrapped in national security language. Spending caps are established early and then quietly raised in later sections that few people read.

The bill is structurally designed so that no single reader -- human or AI -- can hold the full picture at once. That is not a limitation of the reader. That is the point.

Atlas does not hold the full picture at once either. It does not need to. It holds every piece, knows where every piece is, and retrieves exactly the right pieces for whatever is being analyzed right now. Deterministic. Auditable. Same query, same data, same answer, every time.

What the scan produces

A section-by-section breakdown of the entire bill in plain English. What each provision actually does. Who benefits. Who pays. How provisions scattered across 1,260 pages interact with each other. Every finding is color-coded by impact on the average citizen:

Green -- directly helps citizens. Red -- directly hurts citizens. Yellow -- mixed or uncertain. Blue -- worth knowing. Gray -- procedural.

Cross-references that a human analyst would need weeks to map. Cross-references that no existing AI tool can make because they either cannot fit the document or cannot connect the pieces once they chunk it.

📄 Download the full NDAA FY2026 scan results (DOCX)

This is not the first time

Before the NDAA, we used Atlas to scan the entire Mozilla Firefox source repository -- 44 million tokens of C, C++, JavaScript, and Rust -- with a 9B model on a single GPU. Atlas gave that 9B model total recall across the entire codebase. It found 72 confirmed vulnerabilities, including 4 multi-step exploit chains that spanned multiple files and directories. Cross-file interactions that no model could see within a single context window.

The congressional scanner is the same architecture applied to a different problem. Code security and legislative analysis have nothing in common on the surface. But the underlying challenge is identical: a document too large for any context window, where the important interactions happen between pieces that are far apart. Atlas does not care what the content is. It retrieves by reference, not by topic.

The numbers

975,394 tokens. One 14B model. One retrieval engine. Consumer GPUs. No cloud. No API costs. No context window large enough to matter -- and it did not need one.

The model did not need to be bigger. It needed better infrastructure around it.

What this changes

Until now, nobody could read a 1,260-page bill and trace every cross-reference. Not a citizen. Not a politician. Not a team of congressional staffers. Not a law firm billing $800 an hour. The bills were unreadable by design, and everyone just accepted that.

This changes that. A citizen can upload the bill their representative voted on and find out what it actually does -- all of it, not just the press release version. A politician can upload the bill they are being asked to vote on and see what is buried on page 1,100 before they sign on page 1. Deterministic cross-referencing across every section, on every page, in plain English.

That capability did not exist before. Not from any lab, at any price, at any scale.

A note to investors

If you read this and do not understand what you are looking at, that is fine. Keep walking. Let another one pass you by.

If you do understand what a deterministic retrieval engine that gives any model unbounded cross-referenced memory means -- for legal, for compliance, for intelligence, for legislation, for any domain where documents are too large and too interconnected for any context window -- then my signature below is a direct line. This is worth a conversation.


apollo@saiql.ai | ShipItClean.com | SAIQL.ai

← Back to News
Want to reach out for some reason, whatever that might be? My name is Apollo, and I am @ SAIQL.ai
ShipItClean is powered by our CodeForge Engine Ask AI About Us
Privacy Policy  ·  Terms of Service  ·  AI Overview
S
Sharona-AI
Online