title: 'GitHub Data Connector Deployment Guide' sidebar_label: 'Deployment Guide' description: 'Operating guide for the GitHub data connector in production: PATs, rate limits, pagination, and observability.' sidebar_position: 10 pagination_prev: null pagination_next: null tags:
Production operating guide for the GitHub data connector covering authentication, GitHub API rate limits, and operational tuning.
The GitHub connector uses the GitHub REST and GraphQL APIs with a personal access token (PAT) or GitHub App installation token.
| Parameter | Description |
|---|---|
github_token | PAT or installation token. Use ${secrets:...} to resolve from a secret store. |
Tokens must be sourced from a secret store in production. Scope the PAT to the minimum required permissions:
repo scope.repo (private) or public_repo (public).read:org.For long-running deployments, prefer GitHub App tokens (installation tokens) over user PATs — they have higher rate limits (15,000/hr vs 5,000/hr per authenticated user) and are not tied to a specific user account.
GitHub's REST API rate limits:
| Auth mode | Limit |
|---|---|
| Unauthenticated | 60 requests/hr per IP |
| Authenticated (PAT) | 5,000 requests/hr |
| GitHub App installation | 15,000 requests/hr |
| Enterprise Server (typical) | Configurable |
The connector respects GitHub's Retry-After and X-RateLimit-Reset headers and backs off accordingly. When the remaining budget falls below a small threshold, requests pause until the next reset window.
GitHub paginates at 100 items per page. Datasets backed by high-volume endpoints (e.g., repos.commits on a monorepo) may require many hours to initially hydrate. Use incremental acceleration with a since filter where possible.
Transient 5xx responses are retried with exponential backoff up to a bounded retry count. Permanent errors (401 Unauthorized, 404 Not Found, 422 Validation Failed) surface immediately.
github.com; lower for GitHub Enterprise Server on the same network.The GitHub connector does not register connector-specific dataset-level instruments in the current release. Monitor via:
query_duration_ms, query_processed_rows, query_failures_total) from runtime.metrics.resilient_http instrumentation./settings/tokens for token-level quota tracking.See Component Metrics for general configuration.
GitHub API calls participate in task history through the HTTP client's span. Each page fetch is a child of the enclosing sql_query or accelerated_table_refresh task.
| Symptom | Likely cause | Resolution |
|---|---|---|
401 Bad credentials | PAT expired / revoked / wrong value. | Rotate the PAT; update the secret store. |
403 rate limit exceeded | Primary hourly rate limit hit. | Increase refresh interval; switch to GitHub App auth for higher quota; use incremental refresh with since. |
403 Secondary rate limit | Burst of concurrent requests tripped abuse detection. | Reduce concurrent refresh; connector will back off automatically. |
404 Not Found on a private repo | Token lacks repo scope. | Regenerate PAT with repo scope. |
| Very slow initial hydration | Large dataset + strict rate limit. | Run first refresh off-peak; use since/updated_since for incremental refreshes. |