Query the Internet: A Web-Native Future for TAM Building
If almost every potential customer lives on the open web, why are we still building lead lists from closed, lagging databases?
Gafar Akinkunmi
GTM Engineer
GTM motions (ABM, cold email, cold calling, partnerships) stand on a critical foundation: a comprehensive and accurate company list. Getting that right is where a surprising amount of budget and energy goes. Many teams rotate through Sales Navigator, Apollo, ZoomInfo, burn about $1k per month, and still end up frustrated by list quantity and quality.
This friction is not because teams are sloppy. It is structural. We have been mining derivatives of the web instead of the web itself.
The Internet Is Already the Dataset
For years, the indexed HTML of public web pages (content and code) has been available to researchers and builders. It powers search, training corpora, and countless analyses. Buried in that HTML are signals GTM teams actually use:
- Tech fingerprints (commerce platforms, analytics, payments, chat widgets)
- On-page proof (pricing pages, compliance badges, partner galleries, PDP features)
- Structured hints (schema.org, sitemaps, JSON-LD)
- Behavioral cues (careers pages, "book a demo" flows, localization, store finders)
- Documents in the open (PDF spec sheets, whitepapers, case studies)
From Results to Source Evidence
Traditional search answers with links. GTM needs proof. A web-native approach (Web Native TAM, or WNT) flips the workflow:
- 1. Write an evidence query that states what should be present in the HTML or code.
- 2. Run it across indexed pages.
- 3. Return companies with the matching snippet, the page URL, and a confidence score.
Operator to Evidence Examples
site:shopify.com "Recharge"filetype:pdf "ISO 9001""SOC 2" site:example.comsite:.fr "HR software"What an Evidence Query Sounds Like
"UK ecommerce stores on Shopify with Recharge or Bold Subscriptions referenced in HTML and free shipping on a PDP."
"EU B2B SaaS with SOC 2 language on a trust or compliance page and a Book a demo CTA."
"Manufacturers listing ISO 9001 in a downloadable PDF and a distributor locator page."
"Marketplaces mentioning Stripe Connect in docs or job posts and showing a vendor onboarding flow."
"HR SaaS companies in France."
The output is not just a company name. It is domain plus matched evidence plus where it was found. That makes downstream work such as enrichment, routing, and outreach faster and less error prone.
Why This Shift Matters
Coverage beats catalogues
The web includes the long tail your competitors do not see.
Freshness by default
Signals are observed where they change first.
Fewer debates
You can show the snippet that justified inclusion.
Lower lock-in
Queries are portable and results are transparent.
Web-Native TAM: A Working Definition
Web-Native TAM (WNT) is the practice of defining and discovering your market by querying public web evidence, not vendor lists. It treats the internet as the source of truth and requires every match to be verifiable, reproducible, and traceable to a page.
Five Principles of Web-Native TAM
- 1
Evidence first
Every inclusion has human-readable proof such as HTML, text, or a document.
- 2
Transparent scoring
Confidence should be visible and adjustable, not a black box.
- 3
Reproducible queries
Same inputs produce the same outputs. Queries are versioned and shareable.
- 4
Portable outputs
Results export in open formats such as JSON or CSV with attached evidence.
- 5
Ethical collection
Public pages only, respect robots.txt, no logins or gated content.
Roles and Workflow Without Reshuffling Your Org
This is not a reorg. It is a new muscle. Write queries and review evidence.
GTM Engineer or Ops
writes and tunes queries, sets evidence thresholds, maintains patterns.
Analyst or SDR Enablement
samples results, audits evidence, flags false positives and negatives.
Sales or Marketing
works from verified lists and can point to the evidence on a call.
RevOps
pipes outputs into CRM or CDP and measures downstream performance by evidence cohort.
Accuracy and Quality: the Minimum You Should Do
Goal: ship lists that are correct enough to use without cleanup.
- 1Accuracy basics
Precision: of the companies we found, how many are truly a fit. Target at least 90%.
Recall: of all the valid companies that exist, how many we captured. Nice to improve, but precision comes first for outbound.
- 2Tighten your pattern
Add a must include rule and a must not include rule.
Include example: HTML contains
stripe.jsand the wordConnect.Exclude example: URL contains
docs.orblog.if you do not want those pages.Keep 10 positive examples and 10 negative examples. Test your query against them before a full run.
- 3Fix common errors
False positives (wrong matches): pattern too loose or matching vendor docs.
Fix: add excludes, require two signals at once, prefer product or pricing pages.False negatives (missed good accounts): vendor markup changed or language varies.
Fix: add synonyms and translations. “Book a demo” can appear as “Request a demo” or “Demander une démo.”
Treat Web-Native TAM like data work. Write tests, keep fixtures, version your queries.
Where This Goes Next
As more GTM teams adopt web native practices, we will see:
- Standardized evidence fields such as signal, snippet, selector, url, observed_at, confidence.
- Query exchanges. Public recipes for common ICPs with tests.
- More honest benchmarks. Not "we found 10k companies" but "we found 8,432 with verifiable evidence."
- Faster GTM experimentation. New hypotheses tested in hours, not quarters.
An Open Invitation
We're open-sourcing the tooling we've been building nights and weekends to make this practical for GTM teams. Join in and contribute fingerprints, test cases, critiques, and better ideas.
The internet is the dataset. Let's query it together.