logo
TerraformTroubleshooting Deploy using Terraform on Oracle Cloud
Terraform

Deploying a Free Tier VM on Oracle Cloud with Terraform: A Debugging Story

A step-by-step walkthrough of every error encountered, why it happened, and how it was fixed — written for those new to Oracle Cloud Infrastructure (OCI) and Terraform.

Deploying a Free Tier VM on Oracle Cloud with Terraform: A Debugging Story

A step-by-step walkthrough of every error encountered, why it happened, and how it was fixed — written for those new to Oracle Cloud Infrastructure (OCI) and Terraform.

🗺️ Background: What Are We Even Doing?

Before diving into the errors, let's establish what all the moving pieces are.

What is Oracle Cloud Infrastructure (OCI)?

OCI is Oracle's cloud computing platform — think of it like renting space in a giant, remote data centre. Instead of buying a physical server, you ask Oracle to spin up a virtual machine (VM) for you. Oracle's Always Free Tier allows you to run two small VMs at no cost, forever. That's what we're trying to create here.

What is Terraform?

Terraform is an Infrastructure-as-Code (IaC) tool made by HashiCorp. Instead of clicking through a web UI to create resources, you write configuration files (.tf files) that describe what you want, and Terraform figures out how to build it. Think of it like writing a recipe — Terraform is the chef that follows it.

What is the OCI Terraform Provider?

Terraform needs a translator to talk to each cloud. The OCI provider (oracle/oci) is that translator for Oracle Cloud. It takes your Terraform instructions and converts them into OCI API calls behind the scenes.

How Does Terraform Talk to OCI?

OCI uses API key authentication. Think of it like a physical key and a lock:

  • You generate a private key (kept secret on your machine — the key)

  • You upload a public key to OCI (the lock)

  • OCI also stores a fingerprint — a short unique identifier of the lock, so you can reference it without exposing the full key

When Terraform wants to create something in OCI, it signs the request with your private key. OCI checks the signature against the registered public key. If they match, the request is allowed.


🗂️ Project Structure

The Terraform project had the following files:

terraform_learn/

├── compute.tf # Defines the VM instance

├── variables.tf # Declares input variable names

├── terraform.tfvars # Provides the actual values (credentials, OCIDs)

├── iam_policy.tf # IAM permission policies

└── outputs.tf # What to print after a successful apply

The workflow was:

terraform plan -out=tfplan.out # Dry-run: what WOULD be created?

terraform apply tfplan.out # Actually create it


🐛 Error 1 — Wrong Authentication Method

The Error

401-NotAuthenticated, The required information to complete authentication was not provided

What is a 401 Error?

A 401 Unauthorized error means "I don't know who you are." OCI rejected the request entirely before even checking permissions. It's like arriving at a hotel and not having any ID at all — you never even get to the front desk.

What Caused It?

The Terraform provider was configured with:

auth = "CloudShell"

OCI Cloud Shell is Oracle's browser-based terminal. It has its own built-in authentication — when you use it interactively, OCI automatically knows who you are through your browser session. But when Terraform runs, it isn't a browser session. Setting auth = "CloudShell" tells Terraform to use that browser-session method, which simply doesn't work for programmatic API calls.

The Fix

Change the auth method to APIKey, which uses the cryptographic key pair:

auth = "APIKey"


🐛 Error 2 — Private Key Path Not Resolving

The Error

401-NotAuthenticated, could not find private key

What is a Private Key Path?

When using API key auth, Terraform needs to find the private key file on disk. The path in terraform.tfvars was:

private_key_path = "~/.oci/oci_api_key.pem"

The ~ symbol is a shell shorthand for your home directory (e.g., /home/fongyang). It's like writing "my house" instead of "123 Main Street" — everyone in a conversation understands it, but a computer program needs the full address.

What Caused It?

The OCI Terraform provider does not expand the ~ shorthand. It tried to find a file literally named ~/.oci/oci_api_key.pem — which doesn't exist. The tilde was never substituted.

The Fix

Use the absolute path:

private_key_path = "/home/fongyang/.oci/oci_api_key.pem"


🐛 Error 3 — Fingerprint Mismatch

The Error

401-NotAuthenticated, The fingerprint claimed by the user does not match

What is a Fingerprint?

A fingerprint is a short hash derived from your public key — a condensed identifier. Think of it like a person's face vs. their driver's license number. The face (public key) contains all the information, but the license number (fingerprint) is a quick reference. OCI uses the fingerprint to look up which public key to use when verifying a request.

What Caused It?

The fingerprint in terraform.tfvars:

fingerprint = "16:30:10:a2:b7:3f:a7:19:67:b6:d5:f4:8b:0d:d4:b1"

...did not match the fingerprint of the private key file being used (/home/fongyang/.oci/oci_api_key.pem). This happens when:

  • Multiple API keys have been generated over time

  • The .tfvars file was copied from a different setup

  • The wrong key file is referenced

It's like trying to open a lock with the right brand of key, but it's the key to a different lock.

The Fix

Re-derive the correct fingerprint directly from the key file on disk and update the config:

openssl pkey -in /home/fongyang/.oci/oci_api_key.pem -pubout -outform DER \

| openssl dgst -md5 -c | awk '{print $2}'

Then paste the result into terraform.tfvars.


🐛 Error 4 — The Main Culprit: Shape Quota of Zero in the Wrong Availability Domain

This was the trickiest bug, hidden behind a misleading error message.

The Error

404-NotAuthorizedOrNotFound, Authorization failed or requested resource not found.

Operation: LaunchInstance

Why This Error is Misleading

A 404 normally means "the thing you're looking for doesn't exist" — like a broken web link. OCI uses the same 404 code for both authorization failures and resource-not-found situations. This made the error look like a permissions problem when it was actually something else entirely.

The Investigation Path

Because the error said "authorization", we first investigated IAM:

What is IAM?

Identity and Access Management (IAM) is OCI's permission system. It answers: "Is this user allowed to do this action on this resource?" Think of it like a building's security badge system — different badges grant access to different floors.

In OCI, IAM is controlled by Policy statements like:

ALLOW GROUP Administrators TO MANAGE all-resources IN TENANCY

What Did We Find?

  • The user fongyang is in the Administrators group ✅

  • The Administrators group has full tenancy-level access ✅

  • IAM was never the problem

We then investigated other possible causes:

| Check | Result |

|---|---|

| Shape VM.Standard.E2.1.Micro exists? | ✅ Available |

| Subnet exists and is AVAILABLE? | ✅ Regional subnet, no AD restriction |

| Image Oracle-Linux-8.10 found? | ✅ Image ID resolved correctly |

| Compartment is ACTIVE? | ✅ Active |

| Shape quota per Availability Domain? | ❌ QUOTA = 0 in AD-1! |

The Real Root Cause: Service Limits per Availability Domain

What is an Availability Domain (AD)?

An Availability Domain is an isolated data centre within an OCI region. us-ashburn-1 (Northern Virginia) has three ADs:

us-ashburn-1

├── Mbag:US-ASHBURN-AD-1 ← Terraform was targeting this

├── Mbag:US-ASHBURN-AD-2 ← Only AD with capacity

└── Mbag:US-ASHBURN-AD-3

Think of ADs like three separate buildings in the same city. Each building has its own inventory. If Building 1 is out of stock of a particular item, you have to go to Building 2.

What is a Service Limit?

OCI puts caps on how many of each resource type you can create in each location. For the Free Tier shape VM.Standard.E2.1.Micro, here were the limits per AD:

| Availability Domain | Shape | Limit |

|----------------------|--------------------------|-------|

| Mbag:US-ASHBURN-AD-1 | vm-standard-e2-1-micro | 0 | ← No capacity!

| Mbag:US-ASHBURN-AD-2 | vm-standard-e2-1-micro | 2 | ← 2 free VMs allowed

| Mbag:US-ASHBURN-AD-3 | vm-standard-e2-1-micro | 0 | ← No capacity!

Only AD-2 had quota. The Terraform config was using availability_domains[0] — which is index zero, resolving to AD-1. OCI responded with a 404 because from its perspective: "You can't create this here — the quota is zero."

The Fix

A one-line change in compute.tf:

# BEFORE — targets AD-1 (quota = 0)

availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name

# AFTER — targets AD-2 (quota = 2)

availability_domain = data.oci_identity_availability_domains.ads.availability_domains[1].name

Why index [1] and not [2]? Arrays in programming are zero-indexed — they start counting at 0, not 1. So index [0] = AD-1, [1] = AD-2, [2] = AD-3.


✅ Final Result

After the fix:

oci_core_instance.free_vm: Creating...

oci_core_instance.free_vm: Still creating... [10s elapsed]

oci_core_instance.free_vm: Still creating... [20s elapsed]

oci_core_instance.free_vm: Still creating... [30s elapsed]

oci_core_instance.free_vm: Creation complete after 36s

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

instance_public_ip = "132.145.171.179"

The VM was live in 36 seconds. 🎉


📚 Key Lessons Learned

1. OCI's 404 is a Dual-Purpose Error

404-NotAuthorizedOrNotFound covers both missing resources and permission failures. Don't assume it's always an IAM problem — check resource existence, quotas, and availability domain constraints too.

2. Always Check Service Limits Per AD

When a shape isn't launching, run:

oci limits value list \

--compartment-id <tenancy_ocid> \

--service-name compute \

--all \

--query "data[?name=='<shape-limit-name>']" \

--output table

Pay attention to the availability-domain column — a limit of 0 means no capacity in that AD.

3. The ~ Shorthand Doesn't Always Work

In configuration files processed by applications (not the shell itself), always use absolute paths:

# Find your home directory

echo $HOME

4. Keep Fingerprints in Sync

If you regenerate or rotate API keys, always update all three together in terraform.tfvars:

  • fingerprint

  • private_key_path

  • user_ocid

They are a matched set — like a lock, key, and serial number.

5. Bypass Terraform to Isolate Errors

When Terraform gives a vague error, replicate the exact same call using the OCI CLI directly. The CLI often returns the same error but in a context that's easier to reason about:

oci compute instance launch \

--compartment-id <compartment_ocid> \

--availability-domain "<ad_name>" \

--shape "<shape_name>" \

--subnet-id <subnet_ocid> \

--image-id <image_ocid> \

--display-name "test-vm-cli" \

--auth api_key


🔍 Full Error Resolution Timeline

terraform apply

├── ❌ Error 1: auth = "CloudShell" → Fix: auth = "APIKey"

├── ❌ Error 2: ~ not expanded in key path → Fix: use absolute path

├── ❌ Error 3: Fingerprint mismatch → Fix: re-derive from key file

└── ❌ Error 4: 404 on LaunchInstance

├── 🔎 Checked: IAM policy ✅ Admin group, full access

├── 🔎 Checked: Subnet ✅ AVAILABLE, regional

├── 🔎 Checked: Image ✅ Oracle Linux 8.10 found

├── 🔎 Checked: Compartment ✅ ACTIVE

└── 🔎 Checked: Shape quota per AD

├── AD-1: quota = 0 ← Terraform targeted here ❌

├── AD-2: quota = 2 ← Fix: use index [1] ✅

└── AD-3: quota = 0 ❌


Debugging cloud infrastructure often feels like peeling an onion — each layer reveals a new issue underneath. The key is to systematically eliminate possibilities, trust your diagnostic commands, and never assume an error message means exactly what it says.