# Disaster recovery What to do when things go wrong, in rough order of severity. ## Pre-requisite: verify backups are real Before you need them. Run monthly: ```bash ./scripts/test-restore.sh ``` This pulls the latest GCS backup, boots Forgejo against it in a throwaway local container, and probes the API. If it fails, fix backups before you have an actual incident. ## VM is unreachable but the disk is fine Symptoms: Forgejo doesn't load, `gcloud compute ssh ... --tunnel-through-iap` times out, but `forgejo-data` disk and `forgejo-ip` static IP both still exist. Recovery: ```bash cd terraform terraform apply -replace=google_compute_instance.forgejo ``` The data disk has `prevent_destroy = true` and is reattached; cloud-init re-bootstraps the stack against the existing data. The static IP is preserved, so DNS keeps working. ## Persistent disk is corrupted or accidentally deleted 1. (If still present and corrupt) remove `prevent_destroy` from `google_compute_disk.forgejo_data`, then `terraform apply` to destroy and recreate. **Re-add `prevent_destroy` immediately afterward.** 2. SSH to the VM. 3. `sudo /var/lib/forgejo/restore.sh .tar.gz` — restores from GCS into the fresh disk. ## Whole GCP project is lost Worst case, but recoverable from GCS-side backups *if* you copied them out before deleting the project. 1. **Before deleting the old project**: copy the latest backup to durable storage you control. ```bash gsutil cp gs://OLD_PROJECT-forgejo-backups/forgejo-LATEST.tar.gz ~/Backups/ ``` 2. Create a new GCP project, enable APIs. 3. `./scripts/bootstrap-secrets.sh` — this generates *new* `SECRET_KEY` and `INTERNAL_TOKEN`. If you saved the originals to a password manager, manually upload those instead so encrypted DB fields survive (see below). 4. Update `project_id` in `terraform.tfvars`. 5. `terraform apply`. 6. Upload the saved tarball to the new bucket: `gsutil cp ~/Backups/forgejo-LATEST.tar.gz gs://NEW_PROJECT-forgejo-backups/`. 7. SSH to the VM and run `restore.sh`. ### Preserving SECRET_KEY across projects Forgejo uses `SECRET_KEY` to encrypt some DB fields (2FA tokens, OAuth tokens, mirror credentials). Rotating it leaves repos and accounts intact but breaks those features. For bit-exact recovery, save the secrets to a password manager when you first create them: ```bash gcloud secrets versions access latest --secret=forgejo-secret-key gcloud secrets versions access latest --secret=forgejo-internal-token ``` To restore them in a new project, *skip* `bootstrap-secrets.sh` and create the secrets manually with the saved values: ```bash echo -n "OLD_SECRET_KEY_VALUE" | gcloud secrets create forgejo-secret-key \ --replication-policy=automatic --data-file=- echo -n "OLD_INTERNAL_TOKEN_VALUE" | gcloud secrets create forgejo-internal-token \ --replication-policy=automatic --data-file=- ``` ## Backup itself is corrupt This is what `scripts/test-restore.sh` exists to catch *before* an incident. If the latest is corrupt, list older versions: ```bash gsutil ls -l gs://YOUR_PROJECT-forgejo-backups/ ``` Backups are kept 30 days (lifecycle rule in `backups.tf`). Within that window, fall back to an earlier nightly tarball. If all backups in the bucket are corrupt: there is no recovery beyond what's still on the data disk. This is why monthly verification matters. ## Domain / DNS lost The static IP (`google_compute_address.forgejo`) is reserved separately from the VM and persists across VM replacements. You only lose it if you `terraform destroy` or manually release it. To re-point: set your registrar's A record (or Cloud DNS if `manage_dns = true`) to the value of `terraform output static_ip`. Caddy will re-issue a Let's Encrypt cert automatically once DNS resolves and ports 80/443 are reachable. ACME state lives in the data disk (`/mnt/disks/forgejo-data/caddy`), so existing certs survive VM replacements within their validity period. ## Compromise / suspected intrusion 1. Cut public network access immediately: ```bash gcloud compute firewall-rules update allow-https --disabled ``` (Or `terraform` it: temporarily set `source_ranges` to your IP only.) 2. SSH in via IAP, snapshot evidence: `docker logs forgejo > /tmp/forensics.log`, copy `/mnt/disks/forgejo-data/forgejo` aside. 3. Rotate every secret: `forgejo-secret-key`, `forgejo-internal-token`, all Forgejo user passwords + PATs, your Google account password. 4. Review `gcloud logging read 'resource.type=gce_instance'` for unexpected access. 5. If unsure of the compromise vector, treat the disk as tainted: nuke the VM and restore from a backup taken *before* the suspected breach.