Rails as a GPU Orchestration Layer with RunPod
You don’t need a PhD in infrastructure to run GPU workloads. You need a credit card, a RunPod account, and something to coordinate the work. Rails is that something.
RunPod in 30 seconds
RunPod rents you GPUs on demand. Pick a GPU (A100, H100, RTX 4090, whatever your wallet tolerates), deploy a container, and you’re running. No reserved instances, no spot instance anxiety, no 47-click AWS console journey. You pay by the second for what you use.
They offer two flavors:
- GPU Pods — persistent containers with a GPU attached. SSH in, run whatever you want.
- Serverless — send a request, get a response. RunPod handles cold starts, scaling, and queuing. You bring a Docker image with your model.
Serverless is where it gets interesting for Rails developers.
The architecture
[Rails App] → [RunPod Serverless API] → [GPU Worker]
↑ |
└────────── webhook / polling ────────────┘
Your Rails app is the brain. It knows what work needs doing, who requested it, and what to do with the results. RunPod is the muscle. It runs your model on a GPU and hands back the output.
Kicking off a job
RunPod’s serverless API is just an HTTP POST. From Rails:
# app/services/runpod_client.rb
class RunpodClient
ENDPOINT = "https://api.runpod.ai/v2/%<endpoint_id>s/run"
def initialize(endpoint_id:)
@url = format(ENDPOINT, endpoint_id:)
end
def run(input)
response = HTTP
.auth("Bearer #{Rails.application.credentials.runpod_api_key}")
.post(@url, json: { input: })
JSON.parse(response.body)
end
end
Use it anywhere:
client = RunpodClient.new(endpoint_id: "your-endpoint-id")
result = client.run(prompt: "Explain monads like I'm five", max_tokens: 256)
# => { "id" => "abc-123", "status" => "IN_QUEUE" }
That’s it. Your job is queued. RunPod spins up a GPU worker (or routes to a warm one), runs your model, and the result is available when it’s done.
Polling for results from a background job
GPU inference isn’t instant. Offload the waiting to Sidekiq:
# app/jobs/runpod_poll_job.rb
class RunpodPollJob < ApplicationJob
queue_as :queued_task
def perform(task_id, runpod_job_id, attempts: 0)
response = HTTP
.auth("Bearer #{Rails.application.credentials.runpod_api_key}")
.get("https://api.runpod.ai/v2/#{endpoint_id}/status/#{runpod_job_id}")
result = JSON.parse(response.body)
case result["status"]
when "COMPLETED"
AiTask.find(task_id).complete!(result["output"])
when "FAILED"
AiTask.find(task_id).fail!(result["error"])
else
return if attempts > 30
self.class.set(wait: 2.seconds).perform_later(task_id, runpod_job_id, attempts: attempts + 1)
end
end
end
Rails tracks the task lifecycle. RunPod does the heavy lifting. Clean separation.
Why RunPod is great
Straight up — after dealing with AWS SageMaker endpoints, GCP Vertex, and self-managed GPU boxes, RunPod is a breath of fresh air:
- Pricing is transparent. Per-second billing, no surprise egress fees that make you question your career choices.
- Serverless scales to zero. No traffic, no cost. Cold starts are surprisingly fast.
- The dashboard is simple. You can see your workers, logs, and spend without opening twelve tabs.
- Community templates. Want to deploy Stable Diffusion, Whisper, or an LLM? There’s probably a one-click template already.
- You bring your own Docker image. No vendor lock-in. If your container runs locally, it runs on RunPod.
Why Rails as the orchestration layer
Your Rails app already has users, auth, billing, background jobs, and a database. It already knows who wants what. Adding GPU orchestration is just another API call and another Sidekiq job.
You don’t need a separate Python microservice to talk to your GPU provider. You don’t need Celery. You don’t need a FastAPI sidecar that “just handles the AI part” and somehow becomes the most critical service in your stack.
Keep the brain in Rails. Let RunPod be the GPU.