1
0
Fork 0
forge/plan.md

614 lines
20 KiB
Markdown
Raw Permalink Normal View History

# Self-Hosted Forgejo on GCP: Complete Plan
A declarative, low-cost, low-maintenance plan for running a personal Forgejo instance on Google Cloud Platform using Container-Optimized OS, Caddy for HTTPS, and IAP for admin access.
## Goals and constraints
- **Cost**: minimize monthly spend; target ~$24/month
- **Maintenance**: minimal ongoing effort; OS and app patches should apply automatically
- **Security**: minimal attack surface; no public SSH; principle of least privilege for service accounts
- **Reproducibility**: entire stack defined in code; `terraform apply` from a clean project produces a working instance
- **Personal scale**: low traffic, single user, occasional pushes
## Architectural decisions
| Decision | Choice | Rationale |
|---|---|---|
| Compute | e2-micro VM in us-west1, us-central1, or us-east1 | Always-free tier covers the full month |
| OS | Container-Optimized OS (COS) | Read-only root, automatic patching by Google, minimal attack surface, container-first |
| Database | SQLite on persistent disk | Free, sufficient for personal scale, simple to back up |
| Repo storage | Local persistent disk | Fast, reliable, survives VM replacement |
| TLS | Caddy with Let's Encrypt | Auto-renewing certs with one-line config |
| Git access | HTTPS only with personal access token | No SSH port conflicts, no client-side gcloud setup |
| Admin SSH | IAP TCP forwarding | Public port 22 closed; SSH via authenticated Google tunnel |
| App updates | Watchtower with pinned major version tag | Patch updates automatic; major upgrades deliberate |
| OS updates | COS auto-update | Google manages OS patching |
| Backups | Nightly SQLite snapshot + repo tarball to GCS | Survives disk loss, accidental deletion, region failure |
| Secrets | Google Secret Manager, fetched at boot | Out of Terraform state, out of git, encrypted at rest |
| Infrastructure | Terraform | Declarative, replayable, well-documented for GCP |
| VM bootstrap | cloud-init via instance metadata | Native COS support, idempotent on VM replacement |
## Cost estimate
| Item | Monthly cost |
|---|---|
| e2-micro VM (always-free region) | $0 |
| 30 GB standard persistent disk (boot + data combined under 30 GB free tier) | $0 |
| Static external IP attached to running VM | ~$2.92 |
| GCS storage for backups (~1 GB, 30-day retention) | ~$0.05 |
| Secret Manager (2 secrets, low access volume) | ~$0.06 |
| Cloud DNS (optional; can use registrar's DNS) | $0.20 or $0 |
| Egress beyond 1 GB free | $02 depending on usage |
| **Total** | **~$35/month** |
Set a billing budget alert at $10/month to catch surprises early. GCP has no hard spending limit.
## Network exposure
| Port | Protocol | Source | Purpose |
|---|---|---|---|
| 80 | TCP | 0.0.0.0/0 | Caddy HTTP → HTTPS redirect, ACME HTTP-01 challenge |
| 443 | TCP | 0.0.0.0/0 | Caddy HTTPS → Forgejo |
| 22 | TCP | 35.235.240.0/20 (IAP only) | Admin SSH via IAP tunnel |
| All others | — | — | Default deny |
## Repository layout
```
forgejo-infra/
├── terraform/
│ ├── main.tf # VM, disk, instance config
│ ├── network.tf # Firewall rules, static IP
│ ├── iam.tf # Service account, IAP bindings
│ ├── secrets.tf # Secret Manager references (values out-of-band)
│ ├── backups.tf # GCS bucket, lifecycle rules
│ ├── dns.tf # Optional Cloud DNS record
│ ├── variables.tf
│ ├── outputs.tf
│ └── versions.tf
├── cloud-init/
│ └── user-data.yaml.tpl # Systemd units, container startup, backup timer
├── config/
│ └── Caddyfile.tpl # TLS reverse proxy config
├── scripts/
│ ├── bootstrap-secrets.sh # One-time: generate and upload secrets
│ ├── backup.sh # Run on VM via systemd timer
│ ├── restore.sh # Manual recovery from GCS tarball
│ └── test-restore.sh # Verify a backup is restorable
├── docs/
│ ├── runbook.md # Common operations, troubleshooting
│ └── disaster-recovery.md # Step-by-step recovery procedures
├── .gitignore
└── README.md
```
## Terraform: key resources
### main.tf
```hcl
resource "google_compute_disk" "forgejo_data" {
name = "forgejo-data"
type = "pd-standard"
size = 20
zone = var.zone
lifecycle { prevent_destroy = true }
}
resource "google_compute_instance" "forgejo" {
name = "forgejo"
machine_type = "e2-micro"
zone = var.zone
tags = ["forgejo"]
boot_disk {
initialize_params {
image = "cos-cloud/cos-stable"
size = 10
type = "pd-standard"
}
}
attached_disk {
source = google_compute_disk.forgejo_data.id
device_name = "forgejo-data"
}
network_interface {
network = "default"
access_config {
nat_ip = google_compute_address.forgejo.address
}
}
metadata = {
user-data = templatefile("${path.module}/../cloud-init/user-data.yaml.tpl", {
domain = var.domain
forgejo_image = var.forgejo_image
caddy_image = var.caddy_image
gcs_backup_bucket = google_storage_bucket.backups.name
project_id = var.project_id
})
google-logging-enabled = "true"
cos-update-strategy = "update_enabled"
enable-oslogin = "TRUE"
}
service_account {
email = google_service_account.forgejo.email
scopes = ["cloud-platform"]
}
allow_stopping_for_update = true
}
```
### network.tf
```hcl
resource "google_compute_address" "forgejo" {
name = "forgejo-ip"
region = var.region
}
resource "google_compute_firewall" "https" {
name = "allow-https"
network = "default"
direction = "INGRESS"
allow {
protocol = "tcp"
ports = ["80", "443"]
}
source_ranges = ["0.0.0.0/0"]
target_tags = ["forgejo"]
}
resource "google_compute_firewall" "iap_ssh" {
name = "allow-iap-ssh"
network = "default"
direction = "INGRESS"
allow {
protocol = "tcp"
ports = ["22"]
}
source_ranges = ["35.235.240.0/20"]
target_tags = ["forgejo"]
}
```
### iam.tf
```hcl
resource "google_service_account" "forgejo" {
account_id = "forgejo-vm"
display_name = "Forgejo VM service account"
}
resource "google_secret_manager_secret_iam_member" "forgejo_secrets" {
for_each = toset(["forgejo-secret-key", "forgejo-internal-token"])
project = var.project_id
secret_id = each.value
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${google_service_account.forgejo.email}"
}
resource "google_storage_bucket_iam_member" "backups_writer" {
bucket = google_storage_bucket.backups.name
role = "roles/storage.objectAdmin"
member = "serviceAccount:${google_service_account.forgejo.email}"
}
resource "google_iap_tunnel_instance_iam_member" "ssh_admin" {
project = var.project_id
zone = var.zone
instance = google_compute_instance.forgejo.name
role = "roles/iap.tunnelResourceAccessor"
member = "user:${var.admin_email}"
}
resource "google_project_iam_member" "ssh_os_login" {
project = var.project_id
role = "roles/compute.osLogin"
member = "user:${var.admin_email}"
}
```
### backups.tf
```hcl
resource "google_storage_bucket" "backups" {
name = "${var.project_id}-forgejo-backups"
location = var.region
storage_class = "STANDARD"
uniform_bucket_level_access = true
lifecycle_rule {
condition { age = 30 }
action { type = "Delete" }
}
versioning { enabled = false }
}
```
### secrets.tf
```hcl
# Secrets are created out-of-band by scripts/bootstrap-secrets.sh
# This file only declares them as data sources and grants access (in iam.tf)
data "google_secret_manager_secret" "secret_key" {
secret_id = "forgejo-secret-key"
}
data "google_secret_manager_secret" "internal_token" {
secret_id = "forgejo-internal-token"
}
```
### variables.tf
```hcl
variable "project_id" { type = string }
variable "region" { type = string default = "us-central1" }
variable "zone" { type = string default = "us-central1-a" }
variable "domain" { type = string }
variable "admin_email" { type = string }
variable "forgejo_image" {
type = string
default = "codeberg.org/forgejo/forgejo:11"
}
variable "caddy_image" {
type = string
default = "caddy:2-alpine"
}
```
### outputs.tf
```hcl
output "static_ip" {
value = google_compute_address.forgejo.address
description = "Point your domain's A record at this address"
}
output "ssh_command" {
value = "gcloud compute ssh forgejo --zone=${var.zone} --tunnel-through-iap"
description = "Admin SSH via IAP tunnel"
}
```
## Cloud-init: user-data.yaml.tpl
```yaml
#cloud-config
write_files:
- path: /etc/systemd/system/forgejo-data.mount
content: |
[Unit]
Description=Mount Forgejo data disk
Before=docker.service
[Mount]
What=/dev/disk/by-id/google-forgejo-data
Where=/mnt/disks/forgejo-data
Type=ext4
Options=defaults,nofail
[Install]
WantedBy=multi-user.target
- path: /var/lib/forgejo/Caddyfile
content: |
${domain} {
reverse_proxy forgejo:3000
encode gzip
}
- path: /var/lib/forgejo/fetch-secrets.sh
permissions: '0755'
content: |
#!/bin/bash
set -euo pipefail
TOKEN=$(curl -sf -H "Metadata-Flavor: Google" \
"http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" \
| python3 -c "import sys,json;print(json.load(sys.stdin)['access_token'])")
fetch() {
curl -sf -H "Authorization: Bearer $TOKEN" \
"https://secretmanager.googleapis.com/v1/projects/${project_id}/secrets/$1/versions/latest:access" \
| python3 -c "import sys,json,base64;print(base64.b64decode(json.load(sys.stdin)['payload']['data']).decode())"
}
mkdir -p /run
umask 077
{
echo "FORGEJO__security__SECRET_KEY=$(fetch forgejo-secret-key)"
echo "FORGEJO__security__INTERNAL_TOKEN=$(fetch forgejo-internal-token)"
} > /run/forgejo-secrets.env
- path: /etc/systemd/system/forgejo-stack.service
content: |
[Unit]
Description=Forgejo + Caddy + Watchtower
After=forgejo-data.mount network-online.target docker.service
Requires=forgejo-data.mount
Wants=network-online.target
[Service]
Type=oneshot
RemainAfterExit=true
ExecStartPre=/var/lib/forgejo/fetch-secrets.sh
ExecStartPre=-/usr/bin/docker network create web
ExecStart=/usr/bin/docker run -d --name caddy --network web \
-p 80:80 -p 443:443 \
-v /mnt/disks/forgejo-data/caddy:/data \
-v /var/lib/forgejo/Caddyfile:/etc/caddy/Caddyfile:ro \
--restart=unless-stopped \
${caddy_image}
ExecStart=/usr/bin/docker run -d --name forgejo --network web \
-e FORGEJO__server__DISABLE_SSH=true \
-e FORGEJO__server__ROOT_URL=https://${domain}/ \
-e FORGEJO__service__DISABLE_REGISTRATION=true \
-e FORGEJO__database__DB_TYPE=sqlite3 \
--env-file /run/forgejo-secrets.env \
-v /mnt/disks/forgejo-data/forgejo:/data \
--restart=unless-stopped \
${forgejo_image}
ExecStart=/usr/bin/docker run -d --name watchtower \
-v /var/run/docker.sock:/var/run/docker.sock \
--restart=unless-stopped \
containrrr/watchtower --cleanup --schedule "0 0 4 * * *"
ExecStop=/usr/bin/docker stop watchtower forgejo caddy
[Install]
WantedBy=multi-user.target
- path: /var/lib/forgejo/backup.sh
permissions: '0755'
content: |
#!/bin/bash
set -euo pipefail
STAMP=$(date -u +%Y%m%dT%H%M%SZ)
BACKUP_DIR=/mnt/disks/forgejo-data/forgejo
docker exec forgejo sqlite3 /data/gitea/gitea.db ".backup '/data/gitea/snapshot.db'"
tar czf /tmp/forgejo-$STAMP.tar.gz -C /mnt/disks/forgejo-data forgejo
docker run --rm -v /tmp:/tmp google/cloud-sdk:slim \
gsutil cp /tmp/forgejo-$STAMP.tar.gz gs://${gcs_backup_bucket}/
rm /tmp/forgejo-$STAMP.tar.gz
docker exec forgejo rm -f /data/gitea/snapshot.db
- path: /etc/systemd/system/forgejo-backup.service
content: |
[Unit]
Description=Backup Forgejo to GCS
After=forgejo-stack.service
Requires=forgejo-stack.service
[Service]
Type=oneshot
ExecStart=/var/lib/forgejo/backup.sh
- path: /etc/systemd/system/forgejo-backup.timer
content: |
[Unit]
Description=Nightly Forgejo backup
[Timer]
OnCalendar=*-*-* 03:30:00
Persistent=true
[Install]
WantedBy=timers.target
runcmd:
- mkdir -p /mnt/disks/forgejo-data
- if ! blkid /dev/disk/by-id/google-forgejo-data; then mkfs.ext4 -F /dev/disk/by-id/google-forgejo-data; fi
- systemctl daemon-reload
- systemctl enable --now forgejo-data.mount
- mkdir -p /mnt/disks/forgejo-data/forgejo /mnt/disks/forgejo-data/caddy
- systemctl enable --now forgejo-stack.service
- systemctl enable --now forgejo-backup.timer
```
## Bootstrap procedure
### One-time setup (before first `terraform apply`)
1. **Create the GCP project** and enable required APIs:
```bash
gcloud services enable \
compute.googleapis.com \
secretmanager.googleapis.com \
iap.googleapis.com \
storage.googleapis.com
```
2. **Generate and upload secrets** (`scripts/bootstrap-secrets.sh`):
```bash
#!/bin/bash
set -euo pipefail
for SECRET in forgejo-secret-key forgejo-internal-token; do
if ! gcloud secrets describe "$SECRET" >/dev/null 2>&1; then
openssl rand -hex 32 | gcloud secrets create "$SECRET" --data-file=-
echo "Created $SECRET"
else
echo "$SECRET already exists, skipping"
fi
done
```
3. **Configure Terraform variables** in `terraform.tfvars`:
```hcl
project_id = "your-project-id"
domain = "git.yourdomain.com"
admin_email = "you@yourdomain.com"
```
### First deploy
```bash
cd terraform/
terraform init
terraform plan
terraform apply
```
Note the `static_ip` output. Point your domain's A record at it. Wait for DNS propagation (a few minutes typically).
### Forgejo first-run installer
Visit `https://yourdomain` in a browser. Forgejo's installer will appear. Configure:
- Database: SQLite3 (path `/data/gitea/gitea.db`)
- Site title: whatever you want
- Server domain: your domain
- Server base URL: `https://yourdomain/`
- Disable self-registration: yes
- Create the admin user
After this, the installer is locked. Subsequent VM replacements (terraform-driven) will keep the database and skip the installer.
### Generate a personal access token
In Forgejo: Settings → Applications → Generate New Token. Scope it minimally (read/write repository is usually enough). Configure your local git client:
```bash
git config --global credential.helper store
# On first push, enter username and the PAT as password; it'll be saved.
```
## Operations
### Admin SSH
```bash
gcloud compute ssh forgejo --zone=us-central1-a --tunnel-through-iap
```
### Inspect containers
```bash
docker ps
docker logs forgejo
docker logs caddy
journalctl -u forgejo-stack.service
```
### Force an update of containers
```bash
docker exec watchtower kill -s SIGHUP 1
# or
docker pull codeberg.org/forgejo/forgejo:11
sudo systemctl restart forgejo-stack.service
```
### Run a manual backup
```bash
sudo /var/lib/forgejo/backup.sh
gsutil ls gs://YOUR_PROJECT-forgejo-backups/
```
### Restore from backup (`scripts/restore.sh`)
```bash
#!/bin/bash
set -euo pipefail
BACKUP=$1 # e.g. forgejo-20260507T033000Z.tar.gz
sudo systemctl stop forgejo-stack.service
gsutil cp "gs://YOUR_PROJECT-forgejo-backups/$BACKUP" /tmp/
sudo rm -rf /mnt/disks/forgejo-data/forgejo
sudo tar xzf "/tmp/$BACKUP" -C /mnt/disks/forgejo-data/
sudo systemctl start forgejo-stack.service
```
### Major version upgrade of Forgejo
1. Read the [Forgejo release notes](https://codeberg.org/forgejo/forgejo/releases) for breaking changes
2. Take a manual backup
3. Update the `forgejo_image` variable in Terraform (e.g. `codeberg.org/forgejo/forgejo:12`)
4. `terraform apply` — this will replace the VM
5. The persistent disk persists; first boot will run any DB migrations
## Disaster recovery
### Scenario: VM is unrecoverable
`terraform apply` recreates the VM. The persistent disk has `prevent_destroy`, so it survives. Forgejo comes back up with all data intact.
### Scenario: Persistent disk is corrupted or deleted
1. Remove `prevent_destroy` from the data disk resource (if needed)
2. `terraform apply` to create a fresh disk
3. SSH in and run the restore script with the latest GCS backup
### Scenario: Whole project is lost
1. Create a new GCP project
2. Run bootstrap-secrets.sh in the new project (generates new secrets — DB tables encrypted with the old SECRET_KEY for things like 2FA will need re-setup, but repos and basic data are fine)
3. Update `project_id` in tfvars
4. `terraform apply`
5. Manually copy the latest backup tarball from old project's GCS bucket to new one (do this BEFORE deleting the old project)
6. Run restore script
**Note**: rotating `SECRET_KEY` invalidates 2FA tokens and some encrypted fields. For a true bit-exact recovery, also back up the secrets to a password manager you control.
### Scenario: Backup itself is corrupt
This is why we test restores. `scripts/test-restore.sh` should:
1. Spin up a temporary VM (or use a local Docker setup)
2. Restore the latest backup
3. Verify Forgejo starts and at least one repo is browsable
4. Tear down
Run this monthly. Calendar reminder.
## Security checklist
- [x] Public SSH (port 22 from 0.0.0.0/0) blocked at firewall
- [x] Admin SSH only via IAP tunnel
- [x] OS Login enabled (no SSH keys in metadata)
- [x] HTTPS-only; HTTP redirects to HTTPS via Caddy
- [x] Forgejo registration disabled
- [x] Service account has minimum required permissions (Secret Manager read for two specific secrets, Storage write to one specific bucket)
- [x] Secrets in Secret Manager, not in Terraform state or git
- [x] COS auto-updates enabled for OS patching
- [x] Watchtower for application patch updates
- [x] Major version upgrades pinned (no `:latest`)
- [x] Billing budget alert at $10/month
- [x] Backups encrypted at rest in GCS (default), 30-day retention
- [ ] **Manual: enable 2FA on your GCP account** (the IAP gate is only as strong as your Google login)
- [ ] **Manual: enable 2FA on your Forgejo admin account** after first login
- [ ] **Manual: store secret values in a password manager** for cross-project recovery
## Maintenance schedule
| Frequency | Task |
|---|---|
| Continuous | Watchtower handles app patch updates; COS handles OS patches |
| Daily | Automatic backup at 03:30 UTC |
| Monthly | Run `test-restore.sh` to verify backups are restorable |
| Monthly | Review GCP billing for anomalies |
| Quarterly | Review Forgejo release notes; consider major version upgrade |
| Annually | Rotate `SECRET_KEY` and `INTERNAL_TOKEN` (requires care; see Forgejo docs) |
| Annually | Review IAM bindings; remove anything unused |
## Open questions and future work
- **Email notifications**: Forgejo can send issue/PR emails. Easiest path is configuring SMTP via a free-tier transactional email provider (e.g. Brevo, SendGrid). Not covered here; add as `FORGEJO__mailer__*` env vars when needed.
- **Forgejo Actions (CI)**: Runs on dedicated runners. The e2-micro is too small to host runners. If wanted, run a runner on a separate cheap host or skip CI.
- **Repo size growth**: 30 GB persistent disk holds a lot of personal repos but isn't infinite. Monitor with a simple disk-usage alert. Resizing the disk is online and non-disruptive on GCP.
- **Multiple users**: this design assumes one user. Adding more is fine (Forgejo handles it natively) but reconsider the registration-disabled and HTTPS-token approach if multiple humans need access.
- **Geographic redundancy**: not in scope. Backups in GCS are regional; for multi-region durability use a multi-region bucket (slightly more expensive).
## Appendix: useful references
- [Forgejo documentation](https://forgejo.org/docs/)
- [Forgejo Docker image](https://codeberg.org/forgejo/-/packages/container/forgejo/)
- [Container-Optimized OS overview](https://cloud.google.com/container-optimized-os/docs/concepts/features-and-benefits)
- [IAP for TCP forwarding](https://cloud.google.com/iap/docs/using-tcp-forwarding)
- [Caddy documentation](https://caddyserver.com/docs/)
- [GCP free tier](https://cloud.google.com/free/docs/free-cloud-features)
- [Watchtower](https://containrrr.dev/watchtower/)