Crawl Gap (CG) — v0.1 Specification

The first standard metric of CSARmetrics.

I. Why a standard metric matters

A field becomes real when its practitioners can argue about a number.

A discipline coheres around a unit. Once a standard metric exists, every other conversation in the discipline can be anchored to it. Practitioners can compare across positions. Buyers can allocate attention. Researchers can argue.

CSARmetrics needs the same.

Not because a single number captures everything. It does not. Mature standard metrics in any field have known weaknesses, contested formulations, and competing public versions that disagree on edge cases. None of that prevents them from becoming the most useful single metric in their disciplines. The disagreement is the field.

This document specifies Crawl Gap (CG) — proposed as the foundational metric of CSARmetrics. It is not finished. It is meant to start the argument.

II. What CG measures

The Crawl Gap is the difference between a URL's observed crawl frequency and the crawl frequency a baseline URL would receive in the same system, expressed in standard crawl events per unit time.

In practical terms: CG tells you which pages are getting more or less crawl attention than they deserve.

In one sentence:

How many crawl events does this URL earn — or fail to earn — compared to a baseline URL in its position?

A URL with CG = +12.4 received 12.4 more crawl events per month than a baseline URL would have received in the same system.
A URL with CG = -8.1 received 8.1 fewer.
A URL with CG ≈ 0 is performing as the system would expect for its class.

CG is calculated per URL, per crawler agent, per time window. It can be aggregated to templates, sections, subdomains, or the whole property by summing.

Without a baseline, crawl data cannot tell you what is wrong. The Crawl Gap introduces that baseline.

III. The five requirements (and how CG meets them)

Any candidate CSARmetric standard metric must satisfy five conditions.

1. Single number per URL.
CG is one number. Decomposable, but reportable as a single value.

2. Comparable across roles.
A product page's CG and a category page's CG are on the same scale. The position adjustment is built into the baseline.

3. Meaningful zero.
Zero CG means "performing as a baseline URL would in this position." Not "average for the site" — baseline for the position. This distinction is what gives the metric diagnostic power.

4. Real units.
CG is measured in expected crawl events per time window, not in points. A CG of +5/month means five additional crawl events per month, observable in logs.

5. Decomposable.
CG is a sum of contributions from four components: structural (link graph), temporal (freshness signals), technical (render and response cost), and historical (past crawl pattern). Each can be examined separately.

IV. The position baseline — the most important definition

Most "SEO scores" fail because they compare against the wrong reference point.

A naive metric compares against the site average. This is wrong because it tells you a URL is below average without telling you whether that's a problem. The bottom half of every site is below average by definition.

A better metric compares against expected crawl frequency given the URL's features. This is the foundation of CSARmetrics, but it is not yet the position baseline.

The position baseline is the crawl frequency a URL would receive if it were the most generic, freely substitutable URL in its position class.

Concretely: imagine the URL stripped of its specific link equity, its specific freshness signals, its specific historical performance, and replaced with a placeholder URL of the same template, in the same site section, with default values across all controllable features. The crawl frequency of that placeholder is the position baseline for that role.

CG measures how much the actual URL outperforms (or underperforms) that placeholder.

The reason for this construction is that average is a moving target — it changes as the site changes. Position baseline is a stable floor. Comparing against the site average tells you nothing useful, because the average is itself contaminated by the same gaps you're trying to find. Comparing against a position baseline gives you a reference point that doesn't move when the system moves.

V. The components of CG

CG is the sum of four components, each representing a distinct dimension of why a crawler attends to a URL more or less than the position baseline.

CG = sCG + tCG + rCG + hCG

sCG — Structural CG. Contribution from link graph position. Inlink count, inlink quality, distance from key entry points, sitemap inclusion.
tCG — Temporal CG. Contribution from freshness signals. Update frequency, lastmod accuracy, change magnitude.
rCG — Render/Response CG. Contribution from technical efficiency. Server response time, render cost, error rate, redirect chains. (This component can be negative even when others are positive — a slow URL gets crawled less even if the structural signals say it should be crawled more.)
hCG — Historical CG. Contribution from past crawl pattern. Crawler memory effects, freshness of last successful crawl, established cadence.

The decomposition matters because it tells you why a URL has the CG it does. A URL with CG = +15 driven entirely by sCG is winning on link structure. A URL with the same CG driven by tCG is winning on freshness. The interventions to reinforce or correct each are different.

VI. How CG is computed

The general form is:

CG(u, t) = observed_crawls(u, t) − E[crawls | baseline URL in position(u), window t]

In other words: how far above or below the position baseline a URL is performing.

Computing the expectation requires three inputs:

A model of crawler behavior for the agent in question (Googlebot, GPTBot, ClaudeBot — each has its own model). The model takes URL features and returns expected crawl frequency.
A position-class definition for the URL. Position is a tuple of (template type, site section, depth class, content type). Two URLs are in the same position class if they share these features.
A position baseline for that class. This is the modeled crawl frequency for a URL with default values across all controllable features in that position class.

In practice, the model is fit on observed crawl data across the site (or a multi-site corpus, for a more stable baseline). The position baseline is the model's prediction for a URL with neutral feature values.

Implementation note: the most robust position baseline is computed by holding out the bottom decile of URLs in each position class — those with the weakest controllable features — and using their observed mean crawl frequency as the baseline floor. This is a non-parametric alternative to the modeled approach and is more resistant to model misspecification.

VII. Time window

CG is always reported with a time window. Common windows:

30-day CG. The default. Smooths short-term volatility. Useful for monthly reporting.
90-day CG. The right window for trend analysis. Reduces noise from individual crawl events.
7-day CG. Useful only for monitoring during active interventions or troubleshooting.

A URL's CG is meaningful only in comparison to (a) its CG in prior windows, (b) the CG of comparable URLs in the same window, or (c) a target CG for the position class.

A single CG number with no window and no comparison group is not a finding. It is a number.

VIII. What CG is good for

Diagnosis. A URL with sustained negative CG is being under-crawled relative to what its features would predict. The decomposition tells you why. The intervention is targeted at the specific component.

A negative gap is not a signal to investigate. It is the investigation.

Prioritization. Aggregated CG by template or section identifies where crawl budget is being misallocated. A category template with average CG of -4.2 across 200 URLs is leaking 840 crawl events per month relative to baseline. That number can be argued about.

Intervention measurement. When a structural change is made — internal linking, template revision, redirect cleanup — the change in CG before and after the intervention (with proper counterfactuals; see Section IX) is the measurement of effect.

Cross-crawler comparison. A URL might have positive Googlebot CG and negative GPTBot CG. The asymmetry is informative. Different crawlers respond to different features. Knowing which crawlers are under-attending to which URLs is the foundation of multi-crawler optimization.

IX. What CG is not good for

CG is a positional metric. It tells you how a URL is doing relative to a baseline. It does not tell you:

Whether the baseline itself is correct. If the entire site's crawl behavior is broken, every URL might have CG ≈ 0 and the site might still be in trouble. CG is meaningful within a system; it does not validate the system.
Whether crawl attention translates to indexation, ranking, or traffic. CG measures the input to the search system, not the output. A URL with high CG may still fail to rank for unrelated reasons.
Causation from intervention to CG change without a counterfactual. A CG improvement after an intervention is suggestive, not conclusive. CSARmetric Principle 3 (no counterfactual, no causality) applies.

A practitioner who treats CG as the answer to every question will misuse it. CG is a starting point for inquiry, not a substitute for it.

X. Open questions

CG as specified here is version 0.1. The following are unresolved and should be the subject of practitioner debate:

Position class granularity. How fine-grained should position classes be? Too coarse and the baseline is meaningless; too fine and there are too few comparable URLs to compute a stable expectation.
Multi-crawler aggregation. Should there be a single composite CG across all crawlers, or always per-crawler? The composite is convenient; the per-crawler view is more honest. Likely both, with the per-crawler view as the source of truth.
Treatment of new URLs. A URL with no crawl history has no observed value. Should its CG be undefined, or should the model produce an expected CG with appropriate uncertainty?
Position baseline definition. The bottom-decile approach in Section VI is one option. Modeled defaults are another. The choice has real consequences and deserves explicit comparison on real data.
Variance and confidence. Every CG value should carry a confidence interval. The methodology for that interval — bootstrapping, model uncertainty, or both — is not yet standardized.

These are the questions that will define the next several years of CSARmetric work. None of them has a settled answer. The right move is to publish, disagree, and iterate.

XI. The standard, in brief

Crawl Gap (CG). The number of crawl events a URL receives, per time window, in excess of (or below) the crawl events a baseline URL in the same position class would receive. Decomposable into structural, temporal, render/response, and historical components. Reported per crawler agent, per time window, with confidence intervals.

That is the metric.

XII. What comes next

CG needs three things to become a real standard:

Reference implementations. At least two independent open-source implementations, so practitioners can compare results and identify bugs in either.
Public benchmarks. A handful of public datasets (anonymized server logs, known interventions, known outcomes) where CG can be computed and compared against ground truth.
Peer review in practice. Working CSARmetricians publishing CG values for their own sites and arguing about whether the numbers make sense.

The first version will be wrong about something. That is the nature of standard metrics. The right response to the first wrong version of CG is the second version, not no version.

The discipline begins when the arguing begins.

And it begins with a number practitioners can disagree about.

Crawl Gap (CG). Proposed v0.1. The first standard metric of CSARmetrics.