CSARmetrics
Crawl Systems Analysis & Research · Proposed Discipline · v0.1

The discipline of
Crawl Gap
analysis.

CSARmetrics is the input → output system for crawl behavior. Most crawl analytics counts events without explaining them. CSARmetrics models the system that produces those events — and finds the gaps where attention has gone missing.
§ 01

What it is.

CSARmetrics — Crawl Systems Analysis & Research — is the discipline of Crawl Gap analysis: the input → output system for crawl behavior. A site is the input. Crawler behavior on that site is the output. The gap between what the system should produce and what it actually produces is where every actionable insight lives.

It rests on a single shift in perspective: stop measuring what happened; start modeling what should have happened. Once that baseline exists, the gap becomes the unit of analysis — and the unnoticed signal that most crawl analytics misses entirely.

A CSARmetric analysis asks four questions of any crawl:

  1. 1 · Where did the crawler go?
  2. 2 · Where should it have gone?
  3. 3 · Why is the gap what it is?
  4. 4 · What change would close it most efficiently?
§ 02 · Why it matters
Without a baseline, crawl data cannot tell you what is wrong.
The Crawl Gap introduces that baseline.
Scale
Sites are too large for intuition. Crawl behavior must be modeled.
Multiplicity
Googlebot, Bingbot, GPTBot, ClaudeBot — different crawlers behave differently, and matter differently.
Data
Logs, edge telemetry, and render signals now provide enough resolution to observe behavior as a system.
Five years ago, crawl could be described. Now it can be modeled.
§ 03

One example: the Crawl Gap, computed.

Twelve product-detail URLs from a hypothetical kitchenware catalog. Same template, same depth, same position class. A crawler model fit on the site's log history yields a position baseline of 0.40 crawls/day — the rate a generic, freely substitutable URL in this position would receive.

What follows is what a CSARmetric analysis actually looks like. The chart shows observed rates against the position baseline. The shaded gap is the Crawl Gap. The decomposition table below tells you why each URL has the CG it does — and where to intervene.

The Crawl Gap is not the bar. It is the gap between the bar and the baseline.

Definition
Crawl Gap (CG)
CG(u, t) = observed_crawls(u, t) − E[crawls | baseline URL in position(u), window t]
In plain terms: how far above or below the position baseline a URL is performing.
Baseline is position-level for the role — not the site average.
Figure 1
This is what a CSARmetric analysis actually looks like.
Observed crawl frequency vs position baseline
Position class: product-detail, depth 2 · Crawler: Googlebot · Window: 30 days
Observed
Position baseline (E)
+ Gap
− Gap
0.00.40.81.21.6crawls / daycast-iron-skillet-12indutch-oven-7qtchefs-knife-8inwooden-cutting-boardstainless-mixing-bowlssilicone-spatula-setmeasuring-cups-glasscopper-saucepan-2qtbamboo-steamer-basketenameled-french-skilletceramic-mortar-pestleheirloom-tomato-seedsE[crawls | baseline] = 0.40
Each bar is one URL in the same position class. The dashed line is the position baseline — what a generic, freely substitutable URL in this role would receive in this system. The shaded band is the gap: that gap is the Crawl Gap.
Figure 2 · Decomposition
CG is decomposable. The decomposition is the diagnosis.
Two URLs can have the same negative CG for completely different reasons. The decomposition into structural (sCG), temporal (tCG), render/response (rCG), and historical (hCG) components tells you which intervention will actually shift the outcome.
URLObsExpCGsCGtCGrCGhCGDiagnosis
/products/cast-iron-skillet-12in1.430.40+1.03+0.62+0.31+0.08+0.02Structurally winning
/products/silicone-spatula-set0.420.40+0.02+0.020.000.000.00Performing as expected
/products/enameled-french-skillet0.140.40-0.26-0.05-0.03-0.180.00Render/response leak
/products/heirloom-tomato-seeds0.090.40-0.31-0.18-0.08-0.04-0.01Structural neglect
Reading the gaps
cast-iron-skillet-12in (CG +1.03) is winning on link structure (sCG +0.62). This is the system working. No intervention required.
Reading the gaps
enameled-french-skillet (CG −0.26) is bleeding crawl attention through render cost (rCG −0.18). Internal links won't fix this. The render path will.
"A negative gap is not a signal to investigate.It is the investigation."
— Crawl Gap specification, §VIII
About CSARmetrics

CSARmetrics (Crawl Systems Analysis & Research) is a proposed discipline for modeling and measuring how crawlers behave as systems. It is open by design. The metrics are computable by anyone with log data; the methods are meant to be argued, refined, and improved in public.

The discipline begins when the arguing begins. And it begins with a number practitioners can disagree about.