Methodology
How ShipSleuth turns messy public GitHub activity into an honest DD read.
The goal is not to pretend public GitHub is the whole truth. The goal is to make the visible surface more legible, comparable, and harder to misuse.
Optional private supplement
ShipSleuth defaults to public-only analysis, but users can optionally enter self-reported private activity — additional commits, PRs, repos, releases, active days, and lines changed — to produce a more complete picture. When supplemented:
- Public and private metrics are combined into unified totals
- Percentile rankings and scores are recalculated against the combined data
- The coverage section clearly labels which stats come from public repos and which were self-reported
- Shareable stat cards reflect the supplemented data when the user chooses to include it
Private supplement values are self-reported and not verified by ShipSleuth. They are meant to reduce the structural blind spot of private work, not to replace proper diligence. The analysis always indicates when private data has been included.
Calculator-first
Earlier drafts experimented with composite scoring and leaderboard-style presentation. ShipSleuth now prioritizes direct metrics and context to avoid false precision.
Author commitsVisible commit volume from real accounts (bot accounts excluded) inside the window.
Merged PRsA cleaner signal of integrated public work than commit count alone.
Contributor breadthHow many visible contributors and repos share the public activity footprint.
ReleasesVisible shipping artifacts that suggest something reached a public milestone.
Active daysDistinct days with at least one push event inside the window.
Lines changedWeekly code adds + deletes via GitHub code frequency. Noisy — includes all authors and generated code.
Merge velocityMedian hours from PR open to merge for author-attributed PRs merged in-window.
Bus factorMinimum contributors needed to account for half of author commits.
ConcentrationWhether one repo or one actor dominates the visible public output.
ConfidenceHow trustworthy the visible sample looks after caps, failures, and truncation.
Percentile anchor calibration
Where the threshold numbers come from
When ShipSleuth says “Top ~0.1%” for a metric, it interpolates your value against a table of anchor thresholds derived from real GH Archive data. Here is how each one was derived.
Data source
Anchors are derived from GH Archive data queried via ClickHouse Playground (March 2026, covering the preceding 90 days). GH Archive captures all public GitHub events — pushes, PRs, releases, etc. — and ClickHouse provides free, zero-auth SQL access to the full dataset.
Population baseline
GitHub reports 100M+ total accounts (Octoverse 2023). Most are inactive. Querying GH Archive for distinct actors with at least 1 public PushEvent in the last 90 days yields ~8.21M accounts. Filtering out known bot/CI accounts (dependabot, renovate, github-actions, etc.) brings the human pool to ~8.19M — bots account for only ~0.3% of push-active accounts.
Last refreshed: March 2026.
Commits (humanCommits)
GH Archive counts PushEvents, not individual commits. Each push contains ~1-3 commits on average, so we apply a ×2 multiplier. Raw GH Archive percentiles (push events): P50=4, P90=31, P99=201, P99.99=9,954.
8Top ~50%GH Archive P50=4 pushes ×2. Median dev pushes ~4 times in 90 days.
25Top ~25%P75=12 pushes ×2. Commits a few times per week.
60Top ~10%P90=31 pushes ×2. Pushing almost daily — consistent contributor.
115Top ~5%P95=57 pushes ×2. Multiple pushes per day, full-time open-source pace.
400Top ~1%P99=201 pushes ×2. Among the most active public contributors.
1,500Top ~0.1%P99.9=746 pushes ×2. Extremely prolific — often monorepo or multi-project workflows.
20,000Top ~0.01%P99.99=9,954 pushes ×2. Top handful globally — may include automated-but-human-attributed workflows.
Merged PRs (mergedPullRequests)
GH Archive PullRequestEvent(action=closed). Only ~151k actors closed any PRs out of 8.19M pushers (~1.8%). Percentiles are among PR-active users: P50=1, P90=3, P99=12, P99.99=11,292.
1Top ~50%GH Archive P50=1 closed PR. Most PR users close just 1 in 90 days.
2Top ~25%P75=2. Uses PR workflow semi-regularly.
3Top ~10%P90=3. Consistent PR contributor.
4Top ~5%P95=4. Active reviewer and contributor.
12Top ~1%P99=12. Heavy PR throughput — managing multiple repos.
90Top ~0.1%P99.9=90. Among the most active mergers on GitHub.
1,500Top ~0.01%P99.99=11,292. Near the absolute ceiling for human-driven PR closes.
Lines changed (linesChanged)
Not available in GH Archive — retains estimated anchors based on GitHub's code frequency API (weekly adds + deletes). Includes all authors in weeks overlapping the window, so this metric is noisier than commit counts. Large refactors and generated code inflate it.
10kTop ~50%~110 lines/day. Light but steady code changes. (Estimated — no GH Archive data.)
30kTop ~25%~333/day. Regular feature development.
80kTop ~10%~888/day. Heavy development or multiple active projects.
200kTop ~5%~2.2k/day. Major features, migrations, or multiple concurrent repos.
500kTop ~1%~5.5k/day. Often includes generated code, large refactors, or monorepo changes.
1MTop ~0.1%~11k/day. Almost certainly includes codegen, migrations, or vendor updates.
2MTop ~0.01%~22k/day. Extreme outlier — major infrastructure or generated code.
Active repos
GH Archive distinct repos per human actor. P50=1, P75=2, P90=4, P95=6, P99=12, P99.99=118. Most devs push to just 1 repo. Maintaining 12+ active repos in 90 days puts you in the Top ~1%.
Releases
GH Archive ReleaseEvents. Only ~210k actors publish any releases (~2.6% of active devs). Among publishers: P50=1, P75=3, P90=5, P95=9, P99=26, P99.99=1,691.
Active days
GH Archive distinct push dates per actor. P50=2, P75=5, P90=11, P95=19, P99=44, P99.99=91. The median developer pushes on just 2 distinct days per quarter. Active days are capped at window length (91 days = Top ~0.01% for a 90-day window — almost no weekends off).
Contributors
GH Archive distinct human pushers per repo owner. Over 90% of owners have just 1 contributor (themselves). P99=4, P99.9=10, P99.99=45. Having 5+ distinct contributors puts an owner in the Top ~1%.
How interpolation works
Your value is placed between the two nearest anchors. The percentile axis is interpolated in log-space (not linearly) because developer activity follows a power-law distribution — a small number of developers are orders of magnitude more active than the median. Log-interpolation respects this shape.
For non-90-day windows, the value thresholds are scaled proportionally (e.g., a 30-day window scales thresholds to 1/3). The percentile axis stays the same — “Top ~1%” always means Top ~1% regardless of window length.
Limitations and honesty
- Anchors are derived from real GH Archive data, but GH Archive only captures public GitHub events. Private repo activity is invisible.
- GH Archive counts PushEvents, not individual commits. The ×2 multiplier for commits is an approximation — actual commit-per-push ratios vary by workflow.
- PR “closed” events include both merges and rejections. The real merged-PR distribution may differ.
- The real distribution shifts over time as GitHub grows. We plan to refresh anchors periodically via automated ClickHouse queries.
- Monorepo teams, squash-merge policies, and CI bot patterns can inflate or deflate raw counts.
- The “Top ~X%” claim means: “among ~8.2M human accounts active on public GitHub in the last 90 days, we estimate your visible activity places you roughly in the top X%.” It is not an exact ranking.
- You can verify every step: click “View the math” on any analysis result to see your value, the scaled anchors, and the exact interpolation.
Overall score
The composite score is a weighted average of 7 log-scaled dimensions (volume 30%, breadth 19%, consistency 18%, releases 10%, recency 10%, collaboration 8%, concentration 5%). The score itself is then mapped to a percentile tier using a separate set of anchors calibrated against the expected score distribution. A score of 85+ maps to Top 1%; a score of 30 maps to Top 50%.