The AWS API Gateway Impossibility (the story of an API migration)

Nicola Apicella
CodeX
Published in
4 min readMar 24, 2022

--

At work we had to migrate an API Gateway from one AWS account to another. The migration had to be without downtime and be gradual (slowly shifting traffic from the old API to the new one). Since we do not control clients, we can’t ask them to gradual shift calls to a new endpoint.
We thought it would easy — and we were quite wrong.

The idea was to point our existing domain name (app.domain.name.com in the following examples) to two different API Gateways, the old and new. And we were quite sure that could be achieved with some simple DNS changes. This is how we thought we would approach it:

  • Duplicate the API Gateway in the new account
  • Add a weight to the existing DNS A record (in Route53 hosted zone) which points to the old API Gateway
  • Insert a new A record with a weight in the Route53 hosted zone which resolves to the new API Gateway

In other words, we wanted to move from this:

+---------+---------+----------------------+--------------------+
| Record | Weight | Source | Destination |
+=========+=========+======================+====================+
| A | - | app.domain.name.com | API-GW-domain-old |
+---------+---------+----------------------+--------------------+

To this:

+---------+---------+----------------------+--------------------+
| Record | Weight | Source | Destination |
+=========+=========+======================+====================+
| A | 255 | app.domain.name.com | API-GW-domain-old |
+---------+---------+----------------------+--------------------+
| A | 255 | app.domain.name.com | API-GW-domain-new |
+---------+---------+----------------------+--------------------+

Adding the same weight (255 in the example), makes Route53 select with equal probability between the two records. We expected that to be how we could slowly shift traffic from the old api to the new. What follows is the result of trying to implement what described above, failed attempts and misunderstandings in the API Gateway and Route53 integration.

We first created the new API in a new AWS account, just to stumble upon a CloudFormation failure. The API Gateway custom domain resource failed because a custom domain with that domain already existed in another account. Turns out the domain assigned to the regional endpoint must be unique. Thus the CloudFormation failure above. The old API already had an API Gateway custom domain resource with that domain (app.domain.name.com).

Cool, so that does not work. We tried another approach. Create the new API Gateway and assign it a different custom domain, one that would differ only for the left most subdomain. Namely:

+-------+-------+------------------------+------------------------+
| Record| Weight| Source | Destination |
+=======+=======+========================+========================+
| A | 255 | app.domain.name.com | api-old.domain.name.com|
+-------+-------+------------------------+------------------------+
| A | 255 | app.domain.name.com | api-new.domain.name.com|
+-------+-------+------------------------+------------------------+
| A | - | api-old.domain.name.com| API-GW-domain-old |
+-------+-------+------------------------+------------------------+
| A | - | api-new.domain.name.com| API-GW-domain-new |
+-------+-------+------------------------+------------------------+

Having the api domains differ only in the subdomain makes it simple to create a certificate that covers them both (*.domain.name.com).

So, we updated the certificate, we deployed the new API in the new account (this time that worked), created the DNS records as described in the table above. It was time to verify it was working. It’s here that we realized that accessing the old api via the “api-old.domain.name.com” worked, accessing the new api via the “api-new.domain.name.com” worked…but accessing the app via “app.domain.name.com” did not. What we were expecting was a round robin between the old and the new domain, what we got instead was the always eloquent API GW error message: {"message":"Forbidden"}.
Why though? It seems API Gateway service checks the Host header in the request and expects that to be one of the API Gateway custom domain. In other words, the following does not work:

> curl https://app.domain.name.com/demo {"message":"Forbidden"}

But this does:

> curl -H "Host: api-old.domain.name.com" https://app.domain.name.com/demo {"message":"Hello, World!"}

Finally we thought about using a wildcard custom domain. That also does not work because as the docs say:

You can’t create a wildcard custom domain name if a different AWS account has created a custom domain name that conflicts with the wildcard custom domain name.

The docs also say that it is possible to request an exception for that. The lesson learned is that pointing a domain name to two API Gateways is impossible. So how do you migrate it? Some options are:

  • have the new API issue a 301 redirect to the old one
  • old API proxies to the new one
  • deploy a proxy which shift traffic between the old API to the new
  • request exception for wildcard domain to be used in two different AWS accounts

Originally published at https://dev.to on March 24, 2022.

--

--

Nicola Apicella
CodeX

Sr. software dev engineer at Amazon. Golang, Java and container enthusiast. Love automation in general. Opinions are my own.