- $10/month project budget via google_billing_budget, alerts to admin_email - forgejo-reboot.timer at 04:30 UTC applies staged COS updates - relocate cloud-init scripts to /var/lib/google/forgejo (COS noexec on /var) - runbook: updated zone, script paths, added "How updates work" section Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
134 lines
4.8 KiB
Markdown
134 lines
4.8 KiB
Markdown
# Runbook
|
||
|
||
Common operations against the running Forgejo VM.
|
||
|
||
## Admin SSH
|
||
|
||
Public port 22 is closed. Use IAP tunneling:
|
||
|
||
```bash
|
||
gcloud compute ssh forgejo --zone=us-east1-b --tunnel-through-iap
|
||
```
|
||
|
||
Your Google account needs:
|
||
- `roles/iap.tunnelResourceAccessor` on the instance (granted by Terraform via `var.admin_email`)
|
||
- `roles/compute.osLogin` on the project (same)
|
||
- 2FA on the Google account (manual, but strongly recommended — IAP is only as strong as your login)
|
||
|
||
## Inspect the stack
|
||
|
||
```bash
|
||
docker ps # caddy, forgejo, watchtower expected
|
||
docker logs --tail 200 forgejo
|
||
docker logs --tail 200 caddy
|
||
docker logs --tail 200 watchtower
|
||
journalctl -u forgejo-stack.service -n 200
|
||
journalctl -u forgejo-backup.service -n 50
|
||
systemctl list-timers forgejo-backup.timer
|
||
```
|
||
|
||
## Restart the stack
|
||
|
||
```bash
|
||
sudo systemctl restart forgejo-stack.service
|
||
```
|
||
|
||
Single container only:
|
||
|
||
```bash
|
||
docker restart forgejo
|
||
```
|
||
|
||
## How updates work
|
||
|
||
| Layer | Mechanism | Schedule |
|
||
|---|---|---|
|
||
| Host OS (COS) | `cos-update-strategy=update_enabled` stages updates onto the inactive A/B partition; reboot applies them. | Applied on the nightly reboot below. |
|
||
| Forgejo & Caddy patch updates | Watchtower pulls new image digests for the pinned tags (`forgejo:11`, `caddy:2-alpine`). | 04:00 UTC daily (inside the watchtower container; cron `0 0 4 * * *`). |
|
||
| Forgejo major version (e.g. 11→12) | Bump `var.forgejo_image` in tfvars and `terraform apply` — VM is replaced, data disk persists, first boot runs DB migrations. | Manual / deliberate. |
|
||
| Watchtower itself | Pinned at `containrrr/watchtower` (no tag = `latest`), self-updates with `--cleanup`. | 04:00 UTC daily. |
|
||
| Backups | `forgejo-backup.service` via timer. | 03:30 UTC daily. |
|
||
| Reboot to apply COS updates | `forgejo-reboot.service` runs `shutdown -r +0`. Containers come back via `forgejo-stack.service` + `--restart=unless-stopped`. | 04:30 UTC daily. ~30–60s downtime. |
|
||
|
||
Tonight's order: backup at 03:30 → container update check at 04:00 → reboot at 04:30. Backups always land before any reboot, so a bad update can be rolled back from GCS.
|
||
|
||
### Disable the nightly reboot
|
||
|
||
If the reboot ever causes trouble, turn it off without affecting backups or container updates:
|
||
|
||
```bash
|
||
gcloud compute ssh forgejo --zone=us-east1-b --tunnel-through-iap \
|
||
--command='sudo systemctl disable --now forgejo-reboot.timer'
|
||
```
|
||
|
||
Re-enable with `enable --now` instead of `disable --now`. Cloud-init will re-enable it on the next VM replacement regardless.
|
||
|
||
## Update containers immediately
|
||
|
||
Watchtower pulls new images at 04:00 UTC by default. To force now:
|
||
|
||
```bash
|
||
docker exec watchtower kill -s SIGHUP 1
|
||
# or, manually:
|
||
docker pull codeberg.org/forgejo/forgejo:11
|
||
sudo systemctl restart forgejo-stack.service
|
||
```
|
||
|
||
## Run a backup on demand
|
||
|
||
```bash
|
||
sudo /var/lib/google/forgejo/backup.sh
|
||
gsutil ls gs://YOUR_PROJECT-forgejo-backups/
|
||
```
|
||
|
||
## Restore from a backup
|
||
|
||
`scripts/restore.sh` is in the repo, not on the VM. Copy it over and run:
|
||
|
||
```bash
|
||
gcloud compute scp scripts/restore.sh forgejo:/tmp/restore.sh \
|
||
--zone=us-east1-b --tunnel-through-iap
|
||
gcloud compute ssh forgejo --zone=us-east1-b --tunnel-through-iap \
|
||
--command='sudo bash /tmp/restore.sh forgejo-20260507T033000Z.tar.gz'
|
||
```
|
||
|
||
For a clean-environment dry run, use `scripts/test-restore.sh` from your workstation — it pulls the latest backup, boots Forgejo against it in a throwaway container, and probes the API.
|
||
|
||
## Forgejo major version upgrade
|
||
|
||
1. Read the [release notes](https://codeberg.org/forgejo/forgejo/releases) for breaking changes.
|
||
2. Take a manual backup (`sudo /var/lib/google/forgejo/backup.sh`).
|
||
3. Bump `forgejo_image` in `terraform.tfvars` (e.g. `codeberg.org/forgejo/forgejo:12`).
|
||
4. `terraform apply` — replaces the VM. The data disk persists; first boot runs DB migrations.
|
||
5. Watch `docker logs forgejo` to confirm migrations and startup.
|
||
|
||
## Resize the data disk
|
||
|
||
GCP supports online disk growth:
|
||
|
||
```bash
|
||
gcloud compute disks resize forgejo-data --zone=us-east1-b --size=40
|
||
```
|
||
|
||
Then on the VM:
|
||
|
||
```bash
|
||
sudo resize2fs /dev/disk/by-id/google-forgejo-data
|
||
```
|
||
|
||
Update `size = 40` in `terraform/main.tf` afterward to keep state in sync.
|
||
|
||
## Rotate secrets
|
||
|
||
```bash
|
||
# Add a new version (the latest is read at boot):
|
||
openssl rand -hex 32 | gcloud secrets versions add forgejo-secret-key --data-file=-
|
||
sudo systemctl restart forgejo-stack.service
|
||
```
|
||
|
||
Rotating `SECRET_KEY` invalidates 2FA and some encrypted DB fields. Read the Forgejo docs before rotating.
|
||
|
||
## Cost / billing watch
|
||
|
||
- A $10/month project budget is managed by `terraform/budget.tf`. Email alerts at 50%, 90%, 100% (current spend) and 100% (forecasted) go to `admin_email`. Adjust the threshold via `budget_amount_usd` in tfvars.
|
||
- Skim the billing report monthly. Egress is the most likely surprise.
|