Methodology & Sources — The Pharma Closeout

How the data is built.

Every score is traceable to primary sources. Here is exactly how the data, the sentiment, and the outcomes are produced — and where the limits are.


What this is

The Deal Intelligence database is a structured record of roughly 1,284 pharma and biotech business-development transactions — acquisitions, licensing deals, collaborations, and option agreements — spanning the early 1990s through 2026.

Each deal carries three things: an Announcement Sentiment score that captures how the market reacted when the deal was struck, an Outcome Score that grades how the deal actually aged once results were in, and a body of sourced facts — deal value, structure, assets, revenue, and milestones — each tied to a traceable source URL.

~1,284
Deals tracked
1990s–2026
Coverage window
7
Source tiers
3×/wk
Refresh cadence

Announcement Sentiment

Announcement Sentiment is a 0–100 reading of how the street received a deal at the moment it was announced. It is computed only when there are at least four sentiment-tagged sources — analyst notes, trade-press coverage, and primary press releases — each individually tagged Bullish, Neutral, or Bearish. Until a deal clears that four-source threshold, its sentiment is shown as pending rather than guessed.

This metric measures the market's reaction, not our opinion. It is a mirror held up to the analyst and trade community as the deal landed. We surface the underlying reviews so you can read the consensus yourself.

One honest caveat: announcement-day coverage skews optimistic. Press releases are written by the dealmakers, and sell-side notes near a signing tend to extend the benefit of the doubt. A high Announcement Sentiment means the deal sounded good on day one — not that it was good. That is precisely what the Outcome Score exists to test.

Scores are banded into verbal tiers — Exceptional (90+), Strong (75–89), Adequate (60–74), Weak (40–59), and Failed (below 40).


Outcome Score

The Outcome Score grades how a deal actually aged. It is assessed at 5-, 10-, and 15-year anniversaries of the deal, each window backed by sourced facts — what was reported, approved, filed, or written down. The score unlocks five years after close; before then there is not yet enough verified history to grade fairly, so the deal shows no outcome.

The score is a weighted blend of four dimensions, then adjusted by a deal-difficulty factor:

Strategic Fit~25%

Does the deal align with the buyer's stated strategy, therapeutic-area focus, and portfolio gaps? Weighted higher for platform and licensing deals.

Financial Return~35%

Post-deal revenue, margin contribution, and IRR versus upfront plus milestones paid. Weighted highest for launched-product deals.

Asset Performance~25%

Clinical readouts, approvals, label expansions, and pipeline progression of the acquired assets. Weighted highest for pre-clinical deals.

Value Realization~15%

Integration execution, talent retention, milestone achievement, and synergies captured versus targeted.

Those base weights shift by the deal's lifecycle stage — pre-clinical, Phase II/III, launched product, platform, or licensing — so a pre-clinical acquisition is judged mostly on whether the science worked, while a launched-product deal is judged mostly on the money. The weighted total is then multiplied by a deal-difficulty factor (0.8–1.2): pulling off a hard, complex integration counts for more than coasting through an easy one.

Outcome tiers read as plain verdicts — Outperformed, Met Thesis, Tracking, Underperformed, and Failed.


Source hierarchy

Facts are researched primary-source-first. Higher tiers must be exhausted before falling to lower ones, and when two sources conflict, the higher tier wins. Different facts demand different minimum tiers — deal value and close date must come from an SEC filing; pipeline assets must come from a company's own pipeline page or its archived equivalent.

Tier Source What it provides / when it's used
1 Company IR / Pipeline Page The company's own pipeline, products, and investor pages. First action for every deal — authoritative for the asset list and asset status.
2 SEC / Regulatory Filings 8-K, 10-K, EDGAR. Definitive for deal value, close date, deal structure, and segment revenue.
3 Wayback Machine Recovers a defunct or acquired company's pipeline page as it stood at deal time. The historical-deal workhorse.
4 ClinicalTrials.gov Authoritative for active clinical programs, trial phase, and indications.
5 Company Press Releases Milestone achievements, earnings releases, and stated deal rationale.
6 Industry Publications Analyst sentiment, competitive context, and market framing. The right source for the Announcement Sentiment reviews.
7 General Web Search Gap-filling only, after Tiers 1–6 are exhausted. Flagged as lower confidence.

Field-to-tier minimums are enforced: revenue figures must come from a dated filing — never “analysts estimate $X” — and asset statuses are verified against Tier 1 or Tier 4 within six months of insertion.


Confidence & provenance

Every record carries its own confidence level, derived from how many and how strong its sources are. This is what lets you decide how much weight to put on a given number.

Confirmed

Backed by two or more primary sources — company website, SEC filing, or ClinicalTrials.gov. Auto-published.

Reported

One primary source plus at least one press release. Solid, single-anchor sourcing. Auto-published.

Estimated

Only trade-press or general web sources. Held for human review before it is trusted, not auto-published.

Unknown

No source located. The field is left blank and flagged for review — never filled with a guess.

Each value is also marked primary-sourced versus model-inferred, and timeline events carry a “Last verified” stamp. That stamp is the date a primary source was last successfully checked — pipeline data is re-verified on a 90-day cycle, clinical-trial status every 30 days, financials every 90, and sentiment every 180. If a refresh can't reach a primary source, the stamp is not advanced; the system never invents a timestamp to look fresher than it is.


AI disclosure

We believe you can only trust a number if you know how it was made. So, plainly:

The database is assembled by an AI research pipeline that runs three times a week against the primary sources above. Every enrichment pass is bounded by automated QA gates — source-URL requirements, tier minimums, dated-filing checks for financials — and surfaced for editorial review before low-confidence records are published.

We never fabricate. When a fact can't be traced to a real source, it is left blank and flagged — not guessed, not back-filled from a model's memory, not softened into “analysts estimate.” A missing number is more useful than a confident wrong one.

If you spot something off, tell us. Corrections are welcome and they make the database better — reach us via the About page or on LinkedIn.


Methodology v1.0 · last updated 2026-06-11. Scoring rubric, source-tier definitions, and confidence criteria are versioned; material changes will be noted here.