1
0
Fork 0
forge/docs/runbook.md
Jason Hall 15ea287728 add budget alert and nightly OS-update reboot
- $10/month project budget via google_billing_budget, alerts to admin_email
- forgejo-reboot.timer at 04:30 UTC applies staged COS updates
- relocate cloud-init scripts to /var/lib/google/forgejo (COS noexec on /var)
- runbook: updated zone, script paths, added "How updates work" section

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 20:35:58 -04:00

4.8 KiB
Raw Blame History

Runbook

Common operations against the running Forgejo VM.

Admin SSH

Public port 22 is closed. Use IAP tunneling:

gcloud compute ssh forgejo --zone=us-east1-b --tunnel-through-iap

Your Google account needs:

  • roles/iap.tunnelResourceAccessor on the instance (granted by Terraform via var.admin_email)
  • roles/compute.osLogin on the project (same)
  • 2FA on the Google account (manual, but strongly recommended — IAP is only as strong as your login)

Inspect the stack

docker ps                                  # caddy, forgejo, watchtower expected
docker logs --tail 200 forgejo
docker logs --tail 200 caddy
docker logs --tail 200 watchtower
journalctl -u forgejo-stack.service -n 200
journalctl -u forgejo-backup.service -n 50
systemctl list-timers forgejo-backup.timer

Restart the stack

sudo systemctl restart forgejo-stack.service

Single container only:

docker restart forgejo

How updates work

Layer Mechanism Schedule
Host OS (COS) cos-update-strategy=update_enabled stages updates onto the inactive A/B partition; reboot applies them. Applied on the nightly reboot below.
Forgejo & Caddy patch updates Watchtower pulls new image digests for the pinned tags (forgejo:11, caddy:2-alpine). 04:00 UTC daily (inside the watchtower container; cron 0 0 4 * * *).
Forgejo major version (e.g. 11→12) Bump var.forgejo_image in tfvars and terraform apply — VM is replaced, data disk persists, first boot runs DB migrations. Manual / deliberate.
Watchtower itself Pinned at containrrr/watchtower (no tag = latest), self-updates with --cleanup. 04:00 UTC daily.
Backups forgejo-backup.service via timer. 03:30 UTC daily.
Reboot to apply COS updates forgejo-reboot.service runs shutdown -r +0. Containers come back via forgejo-stack.service + --restart=unless-stopped. 04:30 UTC daily. ~3060s downtime.

Tonight's order: backup at 03:30 → container update check at 04:00 → reboot at 04:30. Backups always land before any reboot, so a bad update can be rolled back from GCS.

Disable the nightly reboot

If the reboot ever causes trouble, turn it off without affecting backups or container updates:

gcloud compute ssh forgejo --zone=us-east1-b --tunnel-through-iap \
  --command='sudo systemctl disable --now forgejo-reboot.timer'

Re-enable with enable --now instead of disable --now. Cloud-init will re-enable it on the next VM replacement regardless.

Update containers immediately

Watchtower pulls new images at 04:00 UTC by default. To force now:

docker exec watchtower kill -s SIGHUP 1
# or, manually:
docker pull codeberg.org/forgejo/forgejo:11
sudo systemctl restart forgejo-stack.service

Run a backup on demand

sudo /var/lib/google/forgejo/backup.sh
gsutil ls gs://YOUR_PROJECT-forgejo-backups/

Restore from a backup

scripts/restore.sh is in the repo, not on the VM. Copy it over and run:

gcloud compute scp scripts/restore.sh forgejo:/tmp/restore.sh \
  --zone=us-east1-b --tunnel-through-iap
gcloud compute ssh forgejo --zone=us-east1-b --tunnel-through-iap \
  --command='sudo bash /tmp/restore.sh forgejo-20260507T033000Z.tar.gz'

For a clean-environment dry run, use scripts/test-restore.sh from your workstation — it pulls the latest backup, boots Forgejo against it in a throwaway container, and probes the API.

Forgejo major version upgrade

  1. Read the release notes for breaking changes.
  2. Take a manual backup (sudo /var/lib/google/forgejo/backup.sh).
  3. Bump forgejo_image in terraform.tfvars (e.g. codeberg.org/forgejo/forgejo:12).
  4. terraform apply — replaces the VM. The data disk persists; first boot runs DB migrations.
  5. Watch docker logs forgejo to confirm migrations and startup.

Resize the data disk

GCP supports online disk growth:

gcloud compute disks resize forgejo-data --zone=us-east1-b --size=40

Then on the VM:

sudo resize2fs /dev/disk/by-id/google-forgejo-data

Update size = 40 in terraform/main.tf afterward to keep state in sync.

Rotate secrets

# Add a new version (the latest is read at boot):
openssl rand -hex 32 | gcloud secrets versions add forgejo-secret-key --data-file=-
sudo systemctl restart forgejo-stack.service

Rotating SECRET_KEY invalidates 2FA and some encrypted DB fields. Read the Forgejo docs before rotating.

Cost / billing watch

  • A $10/month project budget is managed by terraform/budget.tf. Email alerts at 50%, 90%, 100% (current spend) and 100% (forecasted) go to admin_email. Adjust the threshold via budget_amount_usd in tfvars.
  • Skim the billing report monthly. Egress is the most likely surprise.