1
0
Fork 0
forge/plan.md
Jason Hall 4dc1b58f2f initial commit
Signed-off-by: Jason Hall <imjasonh@gmail.com>
2026-05-07 20:02:59 -04:00

613 lines
20 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Self-Hosted Forgejo on GCP: Complete Plan
A declarative, low-cost, low-maintenance plan for running a personal Forgejo instance on Google Cloud Platform using Container-Optimized OS, Caddy for HTTPS, and IAP for admin access.
## Goals and constraints
- **Cost**: minimize monthly spend; target ~$24/month
- **Maintenance**: minimal ongoing effort; OS and app patches should apply automatically
- **Security**: minimal attack surface; no public SSH; principle of least privilege for service accounts
- **Reproducibility**: entire stack defined in code; `terraform apply` from a clean project produces a working instance
- **Personal scale**: low traffic, single user, occasional pushes
## Architectural decisions
| Decision | Choice | Rationale |
|---|---|---|
| Compute | e2-micro VM in us-west1, us-central1, or us-east1 | Always-free tier covers the full month |
| OS | Container-Optimized OS (COS) | Read-only root, automatic patching by Google, minimal attack surface, container-first |
| Database | SQLite on persistent disk | Free, sufficient for personal scale, simple to back up |
| Repo storage | Local persistent disk | Fast, reliable, survives VM replacement |
| TLS | Caddy with Let's Encrypt | Auto-renewing certs with one-line config |
| Git access | HTTPS only with personal access token | No SSH port conflicts, no client-side gcloud setup |
| Admin SSH | IAP TCP forwarding | Public port 22 closed; SSH via authenticated Google tunnel |
| App updates | Watchtower with pinned major version tag | Patch updates automatic; major upgrades deliberate |
| OS updates | COS auto-update | Google manages OS patching |
| Backups | Nightly SQLite snapshot + repo tarball to GCS | Survives disk loss, accidental deletion, region failure |
| Secrets | Google Secret Manager, fetched at boot | Out of Terraform state, out of git, encrypted at rest |
| Infrastructure | Terraform | Declarative, replayable, well-documented for GCP |
| VM bootstrap | cloud-init via instance metadata | Native COS support, idempotent on VM replacement |
## Cost estimate
| Item | Monthly cost |
|---|---|
| e2-micro VM (always-free region) | $0 |
| 30 GB standard persistent disk (boot + data combined under 30 GB free tier) | $0 |
| Static external IP attached to running VM | ~$2.92 |
| GCS storage for backups (~1 GB, 30-day retention) | ~$0.05 |
| Secret Manager (2 secrets, low access volume) | ~$0.06 |
| Cloud DNS (optional; can use registrar's DNS) | $0.20 or $0 |
| Egress beyond 1 GB free | $02 depending on usage |
| **Total** | **~$35/month** |
Set a billing budget alert at $10/month to catch surprises early. GCP has no hard spending limit.
## Network exposure
| Port | Protocol | Source | Purpose |
|---|---|---|---|
| 80 | TCP | 0.0.0.0/0 | Caddy HTTP → HTTPS redirect, ACME HTTP-01 challenge |
| 443 | TCP | 0.0.0.0/0 | Caddy HTTPS → Forgejo |
| 22 | TCP | 35.235.240.0/20 (IAP only) | Admin SSH via IAP tunnel |
| All others | — | — | Default deny |
## Repository layout
```
forgejo-infra/
├── terraform/
│ ├── main.tf # VM, disk, instance config
│ ├── network.tf # Firewall rules, static IP
│ ├── iam.tf # Service account, IAP bindings
│ ├── secrets.tf # Secret Manager references (values out-of-band)
│ ├── backups.tf # GCS bucket, lifecycle rules
│ ├── dns.tf # Optional Cloud DNS record
│ ├── variables.tf
│ ├── outputs.tf
│ └── versions.tf
├── cloud-init/
│ └── user-data.yaml.tpl # Systemd units, container startup, backup timer
├── config/
│ └── Caddyfile.tpl # TLS reverse proxy config
├── scripts/
│ ├── bootstrap-secrets.sh # One-time: generate and upload secrets
│ ├── backup.sh # Run on VM via systemd timer
│ ├── restore.sh # Manual recovery from GCS tarball
│ └── test-restore.sh # Verify a backup is restorable
├── docs/
│ ├── runbook.md # Common operations, troubleshooting
│ └── disaster-recovery.md # Step-by-step recovery procedures
├── .gitignore
└── README.md
```
## Terraform: key resources
### main.tf
```hcl
resource "google_compute_disk" "forgejo_data" {
name = "forgejo-data"
type = "pd-standard"
size = 20
zone = var.zone
lifecycle { prevent_destroy = true }
}
resource "google_compute_instance" "forgejo" {
name = "forgejo"
machine_type = "e2-micro"
zone = var.zone
tags = ["forgejo"]
boot_disk {
initialize_params {
image = "cos-cloud/cos-stable"
size = 10
type = "pd-standard"
}
}
attached_disk {
source = google_compute_disk.forgejo_data.id
device_name = "forgejo-data"
}
network_interface {
network = "default"
access_config {
nat_ip = google_compute_address.forgejo.address
}
}
metadata = {
user-data = templatefile("${path.module}/../cloud-init/user-data.yaml.tpl", {
domain = var.domain
forgejo_image = var.forgejo_image
caddy_image = var.caddy_image
gcs_backup_bucket = google_storage_bucket.backups.name
project_id = var.project_id
})
google-logging-enabled = "true"
cos-update-strategy = "update_enabled"
enable-oslogin = "TRUE"
}
service_account {
email = google_service_account.forgejo.email
scopes = ["cloud-platform"]
}
allow_stopping_for_update = true
}
```
### network.tf
```hcl
resource "google_compute_address" "forgejo" {
name = "forgejo-ip"
region = var.region
}
resource "google_compute_firewall" "https" {
name = "allow-https"
network = "default"
direction = "INGRESS"
allow {
protocol = "tcp"
ports = ["80", "443"]
}
source_ranges = ["0.0.0.0/0"]
target_tags = ["forgejo"]
}
resource "google_compute_firewall" "iap_ssh" {
name = "allow-iap-ssh"
network = "default"
direction = "INGRESS"
allow {
protocol = "tcp"
ports = ["22"]
}
source_ranges = ["35.235.240.0/20"]
target_tags = ["forgejo"]
}
```
### iam.tf
```hcl
resource "google_service_account" "forgejo" {
account_id = "forgejo-vm"
display_name = "Forgejo VM service account"
}
resource "google_secret_manager_secret_iam_member" "forgejo_secrets" {
for_each = toset(["forgejo-secret-key", "forgejo-internal-token"])
project = var.project_id
secret_id = each.value
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${google_service_account.forgejo.email}"
}
resource "google_storage_bucket_iam_member" "backups_writer" {
bucket = google_storage_bucket.backups.name
role = "roles/storage.objectAdmin"
member = "serviceAccount:${google_service_account.forgejo.email}"
}
resource "google_iap_tunnel_instance_iam_member" "ssh_admin" {
project = var.project_id
zone = var.zone
instance = google_compute_instance.forgejo.name
role = "roles/iap.tunnelResourceAccessor"
member = "user:${var.admin_email}"
}
resource "google_project_iam_member" "ssh_os_login" {
project = var.project_id
role = "roles/compute.osLogin"
member = "user:${var.admin_email}"
}
```
### backups.tf
```hcl
resource "google_storage_bucket" "backups" {
name = "${var.project_id}-forgejo-backups"
location = var.region
storage_class = "STANDARD"
uniform_bucket_level_access = true
lifecycle_rule {
condition { age = 30 }
action { type = "Delete" }
}
versioning { enabled = false }
}
```
### secrets.tf
```hcl
# Secrets are created out-of-band by scripts/bootstrap-secrets.sh
# This file only declares them as data sources and grants access (in iam.tf)
data "google_secret_manager_secret" "secret_key" {
secret_id = "forgejo-secret-key"
}
data "google_secret_manager_secret" "internal_token" {
secret_id = "forgejo-internal-token"
}
```
### variables.tf
```hcl
variable "project_id" { type = string }
variable "region" { type = string default = "us-central1" }
variable "zone" { type = string default = "us-central1-a" }
variable "domain" { type = string }
variable "admin_email" { type = string }
variable "forgejo_image" {
type = string
default = "codeberg.org/forgejo/forgejo:11"
}
variable "caddy_image" {
type = string
default = "caddy:2-alpine"
}
```
### outputs.tf
```hcl
output "static_ip" {
value = google_compute_address.forgejo.address
description = "Point your domain's A record at this address"
}
output "ssh_command" {
value = "gcloud compute ssh forgejo --zone=${var.zone} --tunnel-through-iap"
description = "Admin SSH via IAP tunnel"
}
```
## Cloud-init: user-data.yaml.tpl
```yaml
#cloud-config
write_files:
- path: /etc/systemd/system/forgejo-data.mount
content: |
[Unit]
Description=Mount Forgejo data disk
Before=docker.service
[Mount]
What=/dev/disk/by-id/google-forgejo-data
Where=/mnt/disks/forgejo-data
Type=ext4
Options=defaults,nofail
[Install]
WantedBy=multi-user.target
- path: /var/lib/forgejo/Caddyfile
content: |
${domain} {
reverse_proxy forgejo:3000
encode gzip
}
- path: /var/lib/forgejo/fetch-secrets.sh
permissions: '0755'
content: |
#!/bin/bash
set -euo pipefail
TOKEN=$(curl -sf -H "Metadata-Flavor: Google" \
"http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" \
| python3 -c "import sys,json;print(json.load(sys.stdin)['access_token'])")
fetch() {
curl -sf -H "Authorization: Bearer $TOKEN" \
"https://secretmanager.googleapis.com/v1/projects/${project_id}/secrets/$1/versions/latest:access" \
| python3 -c "import sys,json,base64;print(base64.b64decode(json.load(sys.stdin)['payload']['data']).decode())"
}
mkdir -p /run
umask 077
{
echo "FORGEJO__security__SECRET_KEY=$(fetch forgejo-secret-key)"
echo "FORGEJO__security__INTERNAL_TOKEN=$(fetch forgejo-internal-token)"
} > /run/forgejo-secrets.env
- path: /etc/systemd/system/forgejo-stack.service
content: |
[Unit]
Description=Forgejo + Caddy + Watchtower
After=forgejo-data.mount network-online.target docker.service
Requires=forgejo-data.mount
Wants=network-online.target
[Service]
Type=oneshot
RemainAfterExit=true
ExecStartPre=/var/lib/forgejo/fetch-secrets.sh
ExecStartPre=-/usr/bin/docker network create web
ExecStart=/usr/bin/docker run -d --name caddy --network web \
-p 80:80 -p 443:443 \
-v /mnt/disks/forgejo-data/caddy:/data \
-v /var/lib/forgejo/Caddyfile:/etc/caddy/Caddyfile:ro \
--restart=unless-stopped \
${caddy_image}
ExecStart=/usr/bin/docker run -d --name forgejo --network web \
-e FORGEJO__server__DISABLE_SSH=true \
-e FORGEJO__server__ROOT_URL=https://${domain}/ \
-e FORGEJO__service__DISABLE_REGISTRATION=true \
-e FORGEJO__database__DB_TYPE=sqlite3 \
--env-file /run/forgejo-secrets.env \
-v /mnt/disks/forgejo-data/forgejo:/data \
--restart=unless-stopped \
${forgejo_image}
ExecStart=/usr/bin/docker run -d --name watchtower \
-v /var/run/docker.sock:/var/run/docker.sock \
--restart=unless-stopped \
containrrr/watchtower --cleanup --schedule "0 0 4 * * *"
ExecStop=/usr/bin/docker stop watchtower forgejo caddy
[Install]
WantedBy=multi-user.target
- path: /var/lib/forgejo/backup.sh
permissions: '0755'
content: |
#!/bin/bash
set -euo pipefail
STAMP=$(date -u +%Y%m%dT%H%M%SZ)
BACKUP_DIR=/mnt/disks/forgejo-data/forgejo
docker exec forgejo sqlite3 /data/gitea/gitea.db ".backup '/data/gitea/snapshot.db'"
tar czf /tmp/forgejo-$STAMP.tar.gz -C /mnt/disks/forgejo-data forgejo
docker run --rm -v /tmp:/tmp google/cloud-sdk:slim \
gsutil cp /tmp/forgejo-$STAMP.tar.gz gs://${gcs_backup_bucket}/
rm /tmp/forgejo-$STAMP.tar.gz
docker exec forgejo rm -f /data/gitea/snapshot.db
- path: /etc/systemd/system/forgejo-backup.service
content: |
[Unit]
Description=Backup Forgejo to GCS
After=forgejo-stack.service
Requires=forgejo-stack.service
[Service]
Type=oneshot
ExecStart=/var/lib/forgejo/backup.sh
- path: /etc/systemd/system/forgejo-backup.timer
content: |
[Unit]
Description=Nightly Forgejo backup
[Timer]
OnCalendar=*-*-* 03:30:00
Persistent=true
[Install]
WantedBy=timers.target
runcmd:
- mkdir -p /mnt/disks/forgejo-data
- if ! blkid /dev/disk/by-id/google-forgejo-data; then mkfs.ext4 -F /dev/disk/by-id/google-forgejo-data; fi
- systemctl daemon-reload
- systemctl enable --now forgejo-data.mount
- mkdir -p /mnt/disks/forgejo-data/forgejo /mnt/disks/forgejo-data/caddy
- systemctl enable --now forgejo-stack.service
- systemctl enable --now forgejo-backup.timer
```
## Bootstrap procedure
### One-time setup (before first `terraform apply`)
1. **Create the GCP project** and enable required APIs:
```bash
gcloud services enable \
compute.googleapis.com \
secretmanager.googleapis.com \
iap.googleapis.com \
storage.googleapis.com
```
2. **Generate and upload secrets** (`scripts/bootstrap-secrets.sh`):
```bash
#!/bin/bash
set -euo pipefail
for SECRET in forgejo-secret-key forgejo-internal-token; do
if ! gcloud secrets describe "$SECRET" >/dev/null 2>&1; then
openssl rand -hex 32 | gcloud secrets create "$SECRET" --data-file=-
echo "Created $SECRET"
else
echo "$SECRET already exists, skipping"
fi
done
```
3. **Configure Terraform variables** in `terraform.tfvars`:
```hcl
project_id = "your-project-id"
domain = "git.yourdomain.com"
admin_email = "you@yourdomain.com"
```
### First deploy
```bash
cd terraform/
terraform init
terraform plan
terraform apply
```
Note the `static_ip` output. Point your domain's A record at it. Wait for DNS propagation (a few minutes typically).
### Forgejo first-run installer
Visit `https://yourdomain` in a browser. Forgejo's installer will appear. Configure:
- Database: SQLite3 (path `/data/gitea/gitea.db`)
- Site title: whatever you want
- Server domain: your domain
- Server base URL: `https://yourdomain/`
- Disable self-registration: yes
- Create the admin user
After this, the installer is locked. Subsequent VM replacements (terraform-driven) will keep the database and skip the installer.
### Generate a personal access token
In Forgejo: Settings → Applications → Generate New Token. Scope it minimally (read/write repository is usually enough). Configure your local git client:
```bash
git config --global credential.helper store
# On first push, enter username and the PAT as password; it'll be saved.
```
## Operations
### Admin SSH
```bash
gcloud compute ssh forgejo --zone=us-central1-a --tunnel-through-iap
```
### Inspect containers
```bash
docker ps
docker logs forgejo
docker logs caddy
journalctl -u forgejo-stack.service
```
### Force an update of containers
```bash
docker exec watchtower kill -s SIGHUP 1
# or
docker pull codeberg.org/forgejo/forgejo:11
sudo systemctl restart forgejo-stack.service
```
### Run a manual backup
```bash
sudo /var/lib/forgejo/backup.sh
gsutil ls gs://YOUR_PROJECT-forgejo-backups/
```
### Restore from backup (`scripts/restore.sh`)
```bash
#!/bin/bash
set -euo pipefail
BACKUP=$1 # e.g. forgejo-20260507T033000Z.tar.gz
sudo systemctl stop forgejo-stack.service
gsutil cp "gs://YOUR_PROJECT-forgejo-backups/$BACKUP" /tmp/
sudo rm -rf /mnt/disks/forgejo-data/forgejo
sudo tar xzf "/tmp/$BACKUP" -C /mnt/disks/forgejo-data/
sudo systemctl start forgejo-stack.service
```
### Major version upgrade of Forgejo
1. Read the [Forgejo release notes](https://codeberg.org/forgejo/forgejo/releases) for breaking changes
2. Take a manual backup
3. Update the `forgejo_image` variable in Terraform (e.g. `codeberg.org/forgejo/forgejo:12`)
4. `terraform apply` — this will replace the VM
5. The persistent disk persists; first boot will run any DB migrations
## Disaster recovery
### Scenario: VM is unrecoverable
`terraform apply` recreates the VM. The persistent disk has `prevent_destroy`, so it survives. Forgejo comes back up with all data intact.
### Scenario: Persistent disk is corrupted or deleted
1. Remove `prevent_destroy` from the data disk resource (if needed)
2. `terraform apply` to create a fresh disk
3. SSH in and run the restore script with the latest GCS backup
### Scenario: Whole project is lost
1. Create a new GCP project
2. Run bootstrap-secrets.sh in the new project (generates new secrets — DB tables encrypted with the old SECRET_KEY for things like 2FA will need re-setup, but repos and basic data are fine)
3. Update `project_id` in tfvars
4. `terraform apply`
5. Manually copy the latest backup tarball from old project's GCS bucket to new one (do this BEFORE deleting the old project)
6. Run restore script
**Note**: rotating `SECRET_KEY` invalidates 2FA tokens and some encrypted fields. For a true bit-exact recovery, also back up the secrets to a password manager you control.
### Scenario: Backup itself is corrupt
This is why we test restores. `scripts/test-restore.sh` should:
1. Spin up a temporary VM (or use a local Docker setup)
2. Restore the latest backup
3. Verify Forgejo starts and at least one repo is browsable
4. Tear down
Run this monthly. Calendar reminder.
## Security checklist
- [x] Public SSH (port 22 from 0.0.0.0/0) blocked at firewall
- [x] Admin SSH only via IAP tunnel
- [x] OS Login enabled (no SSH keys in metadata)
- [x] HTTPS-only; HTTP redirects to HTTPS via Caddy
- [x] Forgejo registration disabled
- [x] Service account has minimum required permissions (Secret Manager read for two specific secrets, Storage write to one specific bucket)
- [x] Secrets in Secret Manager, not in Terraform state or git
- [x] COS auto-updates enabled for OS patching
- [x] Watchtower for application patch updates
- [x] Major version upgrades pinned (no `:latest`)
- [x] Billing budget alert at $10/month
- [x] Backups encrypted at rest in GCS (default), 30-day retention
- [ ] **Manual: enable 2FA on your GCP account** (the IAP gate is only as strong as your Google login)
- [ ] **Manual: enable 2FA on your Forgejo admin account** after first login
- [ ] **Manual: store secret values in a password manager** for cross-project recovery
## Maintenance schedule
| Frequency | Task |
|---|---|
| Continuous | Watchtower handles app patch updates; COS handles OS patches |
| Daily | Automatic backup at 03:30 UTC |
| Monthly | Run `test-restore.sh` to verify backups are restorable |
| Monthly | Review GCP billing for anomalies |
| Quarterly | Review Forgejo release notes; consider major version upgrade |
| Annually | Rotate `SECRET_KEY` and `INTERNAL_TOKEN` (requires care; see Forgejo docs) |
| Annually | Review IAM bindings; remove anything unused |
## Open questions and future work
- **Email notifications**: Forgejo can send issue/PR emails. Easiest path is configuring SMTP via a free-tier transactional email provider (e.g. Brevo, SendGrid). Not covered here; add as `FORGEJO__mailer__*` env vars when needed.
- **Forgejo Actions (CI)**: Runs on dedicated runners. The e2-micro is too small to host runners. If wanted, run a runner on a separate cheap host or skip CI.
- **Repo size growth**: 30 GB persistent disk holds a lot of personal repos but isn't infinite. Monitor with a simple disk-usage alert. Resizing the disk is online and non-disruptive on GCP.
- **Multiple users**: this design assumes one user. Adding more is fine (Forgejo handles it natively) but reconsider the registration-disabled and HTTPS-token approach if multiple humans need access.
- **Geographic redundancy**: not in scope. Backups in GCS are regional; for multi-region durability use a multi-region bucket (slightly more expensive).
## Appendix: useful references
- [Forgejo documentation](https://forgejo.org/docs/)
- [Forgejo Docker image](https://codeberg.org/forgejo/-/packages/container/forgejo/)
- [Container-Optimized OS overview](https://cloud.google.com/container-optimized-os/docs/concepts/features-and-benefits)
- [IAP for TCP forwarding](https://cloud.google.com/iap/docs/using-tcp-forwarding)
- [Caddy documentation](https://caddyserver.com/docs/)
- [GCP free tier](https://cloud.google.com/free/docs/free-cloud-features)
- [Watchtower](https://containrrr.dev/watchtower/)