Portfolio

pytrials-v2

A modern, fully-typed Python SDK for the ClinicalTrials.gov API v2: Pydantic models, async pagination, a validating query builder, rate limiting, and DataFrame integration. Aims to be the default Python library for clinical-trial data.

HealthcareDataCareerFeatured

Visit live ↗View source ↗

Overview

A modern, fully-typed Python SDK for the ClinicalTrials.gov API v2. It turns the messy public API into analysis-ready, validated data, saving analysts and engineers days of glue code. The goal is to be the default Python library for working with clinical-trial data programmatically.

Why it exists

The v2 API (JSON, token pagination, OpenAPI 3.0) launched in 2024 and the XML v1 API was retired. There is still no well-designed Python SDK for v2. The existing options are partial ports, JavaScript toys, or LLM-oriented MCP servers. pytrials-v2 fills the gap with a developer-first design.

What you can do with it

Competitive intelligence: pull every trial a sponsor is running, filter by phase and status, and export for analysis.
Site selection: find facilities running trials for a condition in a target geography from the locations data.
Patient-recruitment feasibility: search recruiting trials by condition, location, age range, and healthy-volunteer status.
Investigator identification: surface the officials and contacts tied to trials in a therapeutic area.
Trial landscape and pipeline tracking: count trials by phase and status, and follow a specific intervention across phases by last update.

What makes it different

Full Pydantic v2 models for every response: real IDE autocomplete, no dict-digging.
A QueryBuilder that validates status, phase, and sort values before the request.
Async auto-pagination that handles pageToken transparently.
DataFrame-ready output that flattens the nested study structure for analysis.
Date normalization across the API's inconsistent formats, plus built-in 50 req/min rate limiting.
API design informed by how regulatory professionals, CROs, and clinical-data teams actually query trial data.

Roadmap

v0.1 Core (shipped): client, search and get, core models, error handling, published to PyPI.
v0.2 Ergonomics: QueryBuilder, async paginator, stats endpoints, rate limiting.
v0.3 Data science: DataFrame integration, docs site, 90%+ coverage.
v1.0 Stable: full results-section models, CLI, notebook examples.

Status: v0.1 is live on PyPI (pip install pytrials-v2). It ships a typed client, Pydantic models, and search and get, fully tested with ruff and mypy-strict in CI. Full design and roadmap live in the repo's PROJECT_PLAN.md.

Live demo

A browser-based search playground, no install required: set a condition, status, and phase, and it queries the live ClinicalTrials.gov API and returns analysis-ready rows (NCT ID, title, status, phase, lead sponsor, site count) with a live total-match count. Built with Marimo and compiled to WebAssembly, so it runs entirely client-side. Try it at pytrials.pyaarproject.org.

Mixing in Open Payments (Sunshine Act)

A companion analysis built on the SDK overlays clinical-trial volume (ClinicalTrials.gov) with 2023 and 2024 industry-to-physician payments (CMS Open Payments, the Sunshine Act, with a year toggle) for the major pharma sponsors, showing which companies rank high on both at once. For example, AbbVie reports about 188M dollars in 2023 physician payments on roughly 1,400 trials, while Pfizer leads on trial count (6,000+) with mid-tier payments. Company names are matched across the two datasets with a curated map (Bristol Myers Squibb, for instance, reports largely under E.R. Squibb & Sons). The payments are legally disclosed relationships, not wrongdoing. Explore it at pytrials.pyaarproject.org/overlap.