Cloud infrastructure for startups and SMEs: a practical guide
Cloud infrastructure is the foundation your product runs on, scales on, and fails on. Most mistakes are not technical — they come from architecture decisions made too early: a Kubernetes cluster no one needs, a vendor you can't leave, or data sitting in a US region that quietly breaks your GDPR promise. This guide walks through the building blocks of modern cloud infrastructure for teams of five to fifty people, honestly: multi-cloud versus single cloud, EU data sovereignty, Infrastructure as Code, containers and Kubernetes, CI/CD, observability, security, and cost optimization. No hype — just the trade-offs that actually matter.
Key takeaways
- Build for the next 18 months, not for scale that's still years away — premature infrastructure costs more than late infrastructure.
- Multi-cloud capable means building portably (containers, open standards), not necessarily running on several clouds at once — true parallel operation doubles your complexity.
- Keep personal data in EU regions from day one (Hetzner, GCP EU, AWS Frankfurt) and sign the DPA before any real customer data flows.
- Infrastructure as Code with Terraform and Ansible makes your environment reproducible, reviewable, and handoff-ready — the payoff shows up at the first incident.
- Containers always, Kubernetes rarely: under 50,000 users with one team, a VM with Docker Compose or a managed platform is usually enough.
- Cost optimization is a regular, targeted pass aimed at big levers — but never trade away expensive engineering time to shave small cloud bills.
What cloud infrastructure actually covers
Cloud infrastructure is more than "a server in the cloud." It is compute (virtual machines, containers, serverless functions), storage (block, object, and database storage), networking (load balancers, private networks, DNS, TLS), and the services that hold it together: deployment pipelines, secrets management, monitoring, and backups. For a startup, the decisive question is not "which of these building blocks exist" but "which do we need now, and which in eighteen months."
The most common mistake early teams make is building infrastructure for scale that is still years away. A two-person team with one product and fewer than 50,000 users does not need a multi-region cluster with a service mesh. It needs a solid, boring foundation: one or two VMs, a managed database, automated backups, and a deploy script the whole team understands. The opposite mistake is just as expensive — clicking everything together by hand in a web console, so that no one can later reconstruct how the current state came to be.
Good cloud infrastructure has three properties. It is reproducible (you can rebuild the entire environment from code), it is observable (you see what's happening before your customers do), and it is handoff-ready (a new engineer understands it in days, not months). These three properties cost a little more discipline up front and pay off from your first incident onward. Everything else in this guide serves those three goals — treat any tool or pattern that doesn't move you toward them with suspicion.
Multi-cloud or single cloud: the honest decision
"Multi-cloud" usually means two different things that deserve to be kept separate. One is portable architecture: you build so that you could, in theory, switch providers without rewriting your product. The other is genuine parallel operation across several clouds at once. The first is almost always smart. The second is almost never necessary until you have a concrete requirement that forces it.
Our stance is multi-cloud capable without crowning any single provider. AWS Frankfurt, Azure, Google Cloud EU regions, and Hetzner each have clear strengths depending on the workload: Hetzner is several times cheaper for raw compute, while the hyperscalers offer deeper managed services and global regions. In practice this often means compute-heavy, predictable workloads on Hetzner, and specialized managed services (a particular database, a specific AI service) on whichever hyperscaler fits, in an EU region. The skill is drawing that line deliberately, rather than binding your entire stack to one vendor's proprietary services out of convenience.
The most expensive mistake is accidental vendor lock-in: you adopt a dozen provider-specific services, and three years later moving off is a six-month project. You avoid it with simple principles — containers instead of proprietary runtimes, open standards for databases and messaging, and Infrastructure as Code that isn't built entirely around one vendor. Real multi-cloud parallel operation, by contrast, roughly doubles your operational complexity; it earns its keep only when you have concrete sovereignty, resilience, or negotiating requirements — not as a default posture.
EU data sovereignty and GDPR as an architecture decision
For companies in Germany, Austria, Switzerland, and the wider EU, where data lives is not a footnote — it is an early architecture decision. GDPR does not blanket-require "servers in Germany," but transferring personal data to third countries like the US is legally involved and, for many of your customers, a deal-breaker. The pragmatic path is to keep data in EU regions from day one: Hetzner (Germany and Finland), Google Cloud and AWS in Frankfurt or other EU locations, Azure in EU regions.
Data sovereignty is more than the location of the disk. Three layers matter: where the data physically sits, who can legally access it, and who operates the underlying platform. A European provider like Hetzner gives you the cleanest position here; a hyperscaler in an EU region is a good compromise when you genuinely need one of its managed services. For every external provider that processes personal data, you need a data processing agreement (DPA) — signed before the first real customer data flows, not after.
If you integrate AI features, the topic sharpens. A prompt sent to a US provider can contain personal data, and the EU AI Act adds transparency and documentation requirements. A deliberate separation helps: process sensitive data in the EU, self-host open-source models where it makes sense (for example, Ollama on your own infrastructure), and for every external AI service check whether EU regions and a clean data processing agreement are available. Sovereignty sometimes costs a little convenience — but in sales it becomes a selling point rather than a risk you have to explain away.
Infrastructure as Code: why clicking by hand is expensive
Infrastructure as Code (IaC) means your entire environment — networks, servers, databases, permissions — is described as versioned code, rather than existing as the accumulated result of clicks in a web console. The two tools we reach for in almost every project are Terraform for provisioning resources and Ansible for configuring what runs inside the servers. Terraform says "these five servers, this network, this database should exist"; Ansible says "these servers should run exactly this software in exactly this configuration."
The payoff shows up at the first serious incident. When a region goes down or an environment is compromised, the difference between "we rebuild everything from code in an hour" and "we try to reconstruct what someone clicked a year ago" is existential. IaC also makes change reviewable: every infrastructure change goes through a pull request, gets reviewed, and lives in git history. It's the same discipline that version control brought to application code — applied to your infrastructure.
There is a real trade-off: IaC requires up-front investment. For a weekend project, a Terraform setup is overkill. But the moment more than one person touches the infrastructure, or you run a production environment your business depends on, clicking by hand becomes the more expensive option — the bill is just deferred until no one remembers why a firewall rule exists. A sensible starting point is modest: Terraform state in a remote backend, separate staging and production environments, and all secrets kept out of the code. Grow the setup as the team and the stakes grow, not before.
Containers and Kubernetes: when to skip k8s
Containers are almost always the right call. Docker packages your application with all its dependencies into a reproducible artifact that runs the same on any machine and in any cloud. That kills the whole class of "but it worked on my machine" problems, and it doubles as your best insurance against vendor lock-in, because a container image isn't tied to any provider. Start with containers — the question is never whether, but what you orchestrate them with.
Kubernetes is excellent technology and, at the same time, the most common piece of premature infrastructure we remove from startup stacks. The honest test has four questions. Do multiple teams deploy independently and get in each other's way without isolation? Is your load genuinely spiky, with 10x differences between quiet and peak? Do compliance or enterprise customers demand real workload isolation? Are you already running ten or more services that need orchestrated rollouts and self-healing? Two or more yeses, and Kubernetes is worth a conversation. Zero or one, and it isn't.
Most teams before their Series A answer all four with no — and are better served by a VM plus Docker Compose, or a managed container platform (Google Cloud Run, AWS App Runner), at a fraction of the operational cost. Kubernetes realistically costs a small team four to eight hours a week in upgrades, YAML upkeep, and ingress debugging — a fifth of an engineer spent on complexity the product doesn't need yet. The reassuring part: when the need does arrive, migrating from clean Compose to Kubernetes takes weeks, not months. Deferring the decision costs you almost nothing.
CI/CD and observability: see it before your customer does
A CI/CD pipeline automates the path from a git commit to running code in production — with tests, build, and deployment in between. With GitLab CI or GitHub Actions you set it up in a day, and it pays off daily. Without a pipeline, every deployment is a manual, error-prone ritual that only one person really knows how to perform; with one, it's a reproducible, tested process anyone can trigger. A deployment people are afraid to run rarely gets run — and rare deployments are risky deployments, because too many changes pile up between them.
Observability answers the question "what is my system doing right now?" — ideally before a customer calls. The three pillars are metrics (Prometheus and Grafana show utilization, latency, and error rates over time), logs (structured and centrally searchable), and alerts (that wake you at night when something is broken — and only then). The failure mode of small teams is rarely "too little monitoring"; it's "too many alerts, so no one takes them seriously anymore." Start with a few meaningful signals: is the service reachable, what's the error rate, how are latency and free disk space.
The connecting idea is that CI/CD and observability together shrink your deploy anxiety. When you can deploy quickly and immediately see whether something went wrong, every individual deployment gets smaller and safer. That loop — small changes, shipped automatically, observed instantly — is the difference between a team that ships weekly and a team that holds its breath before every release. It is also the cheapest reliability upgrade most startups never get around to setting up.
Cost optimization without false economy
Cloud costs drift quietly because every individual decision looks cheap and only the sum shows up at month's end. The three biggest line items are usually over-provisioned compute (VMs sized for their worst day and idle 90 percent of the time), forgotten resources (test environments, orphaned volumes, old snapshots), and data transfer (especially expensive on hyperscalers when data leaves the cloud). A simple cost-optimization pass every few months — rightsizing, cleanup, reserved capacity for predictable load — often delivers the sharpest savings with zero risk.
The biggest strategic lever is choosing the provider per workload. Raw compute is several times cheaper on Hetzner than on the hyperscalers; a predictable, compute-heavy service there instead of on AWS can cut the bill dramatically without giving up anything that matters. Conversely, a hyperscaler's managed service is worth paying for when it saves you weeks of operational work — the higher list price is then a good investment, not waste. The point isn't "always pick the cheapest box"; it's matching each workload to where it's actually cheapest to run and operate.
The most important warning: don't optimize away your engineering time to shave cloud bills. If an engineer spends a week saving 200 euros a month, that was a bad trade. At a startup, the most expensive resource is almost always your team's time, not the cloud invoice. Cost optimization is a regular, targeted pass aimed at a few large levers — not a permanent state of fear where no one spins up a test environment because it costs money. For scale, a simple MVP runs 25,000–40,000 EUR and a full SaaS MVP with AI runs 50,000–120,000 EUR; against those numbers, ongoing infrastructure is usually the smaller line.
Cloud infrastructure that scales with you, not against you
Frequently asked questions about cloud infrastructure
Multi-cloud or a single provider — which is better for a startup?
For most startups, a well-considered single-cloud or Hetzner-plus-hyperscaler approach is the right call, as long as you build portably (containers, open standards, Infrastructure as Code). True multi-cloud parallel operation roughly doubles your operational complexity and only pays off when you have concrete requirements around sovereignty, resilience, or negotiating leverage — not as a default.
Is my cloud infrastructure GDPR-compliant if I use AWS or Google Cloud?
Yes, provided you choose EU regions (such as AWS Frankfurt or GCP in the EU), sign a data processing agreement, and don't transfer personal data to third countries. A European provider like Hetzner gives you the cleanest position on data sovereignty; a hyperscaler in an EU region is a good compromise when you genuinely need one of its managed services.
Does my startup need Kubernetes?
Usually not. If you have one product, one team, and fewer than roughly 50,000 users with no compliance requirement for isolation, a VM with Docker Compose or a managed container platform carries the same load at a fraction of the operational cost. Kubernetes earns its complexity only with multiple teams, highly variable load, or genuine isolation requirements.
What does building cloud infrastructure cost?
It depends heavily on scope. As a reference point: a simple MVP including infrastructure runs 25,000–40,000 EUR, and a full SaaS MVP with AI runs 50,000–120,000 EUR. Ongoing cloud costs are usually the smaller line item — a lean base with one or two VMs and a managed database starts in the low double- to triple-digit euros per month.
Why should I use Infrastructure as Code instead of just clicking in the console?
Because clicking by hand becomes the more expensive option the moment more than one person touches the infrastructure or you run a production environment. Infrastructure as Code with Terraform and Ansible makes your environment reproducible (rebuild from code in hours), reviewable (every change as a pull request), and handoff-ready. The payoff shows up at the first outage.
How do I keep my cloud costs under control?
With a regular, targeted optimization pass: rightsize resources, clean up forgotten test environments and volumes, cover predictable load with reserved capacity, and run compute-heavy workloads on a cheaper provider like Hetzner. Crucially, never trade away expensive engineering time to shave small cloud bills — at a startup, team time is almost always the more expensive resource.
Let's Talk
