Our Journey on the quest of AWS Api Gateway and Serverless, Good, Bad, Ugly
Introduction
Amazon Web Services does offer a large variety of products, which can be good for companies to relocate the legacy application from their premises hosting to BeansTalk and employ asynchronous communications onto SQS or Kinesis streams without any hassle. In this blog post, I’d like to draw your attention on our process, journey and experiences. Additionally you’ll find the bits and bites about the AWS products that we have utilized on the way. Lastly, I will not provide any technical setup code clutters here, as you may appreciate vast amount of examples available on the internet.
Big Bang, that’s how we began
We needed to migrate and extend our existing business method with new requirements. Our transaction was to collect customer ad reports from Mobile App in Android, IOS and Web. Our classifieds business treats the platform safety very high for customers. For that reason inputs from our customers, their satisfaction on speed and error-free systems are high priorities.
As far as functional and non-functional requirements were straightforward, we treated the technical and business flow very simple. For starters we didn’t like our customers waiting for a simple transaction to be completed, with that we imposed only basic validations on the API schema and return a canned response as a delivery confirmation. Later we would deal with all the inner details of each report and implement business requirements asynchronously.
The rest was on us, as the engineering team, we wanted to come up with a compact design and application. As far as typical product development phases kicked in, we have drawn flow diagrams, collected requirements and discussed. We did know what to do and decided to offload the whole logic onto AWS WAF, Apigateway, SQS and Lambda, rather than going on heavy legacy BeansTalks or K8S.
Good
Simplification can be initially mentioned. By employing this approach, we just easily reason about and focus on our business delivery. We can simply point out the reporting business, and work. The whole approach made it compact and robust. As far as we are fully abstained from infrastructure work, and inner details of integrations. Our business flow seamlessly run, we just worry about application errors. Shorter deployments and direct execution of Lambda functions are beyond simplification.
Cost can be another aspect we can talk about. First thing first, the whole mentioned setup, including other cached endpoints, KMS etc. monthly cost us around $40 to 50 per month in Production. As far as the cloud providers’ fame on the basis of high costs is very known, by experience we observe that this cost can be assumed as a fraction, while on the other hand comparing it to running it in a heavy K8S or BeansTalk environment. Secondly Zero time costs on infra execution is another point to be considered.
Bad and Ugly
Testing opportunities can be limited. In this whole bundle, we don’t get the chance to run an integration test from the API level, due to the nature of the API and SQS execution is being offloaded to AWS. If we intend to try out some corner cases in the custom code define in the Swagger API document, say that a custom code which propagates our message to SQS, unfortunately there is no way to imitate it. In this sense, we need to do uploading the code, have it executed in the AWS premises, observe the behavior and see the outcome in the logs.
In this context, we did suffer proper black box-ish integration tests from the perspective of a mobile client that sends message a payload with certain characters to the REST endpoint and see the transformation in between API Gateway, SQS, and Lambda. Essentially, we are only limited to test an emulation starting from the lambda handler part and all the way down to the end of lambda execution, out to another SQS, Database or Kinesis Stream etc.
Debugging and investigating in between AWS products is quite a challenge. Observing issues or hoping to view them in logs can be quite daunting. Let me enlighten you with some context and given an example situation. In the initial service launch, the entire integration was working. We have randomly observed the following error in the logs with no further details:
Gateway response type: DEFAULT_5XX with status code: 500
Well, in the middle of a production issue that’s frustrating right? That’s what we have thought as well. One of the actions we have taken was to enable X-Ray service in API Gateway which has ultimately enhanced the log details of AWS Products. We started seeing more granular and detailed logs:
Endpoint request body after transformations: Action=SendMessage&MessageBody={...}
X-ray Tracing ID : Root=5-g42he9e5b-42429534629dee84e2223c531
Received response. Status: 403, Integration latency: 4 ms
Method completed with status: 500
Gateway response type: DEFAULT_5XX with status code: 500
Gateway response headers: {Access-Control-Allow-Origin=*, Access-Control-Allow-Headers=Content-Type, x-amzn-ErrorType=InternalServerErrorException}"
Gateway response body: {"message": "Internal server error"}"
Execution failed due to configuration error: No match for output mapping and no default output mapping configured. Endpoint Response Status Code: 403
Endpoint response body before transformations: {"Error":{"Code":"AccessDenied","Message":"Access to the resource https://sqs.eu-west-2.amazonaws.com/4512112121/app_queue is denied.","Type":"Sender"},"RequestId":"92211-7867-56a5-853f-69b678ghj2766"}"
With those new developments intact, we started eliminating possible root causes. We started going through messages that were processed and failed. It didn’t take much time until we realized that certain messages with the presence of particular ‘&’ or ‘$’ symbols were causing the issue.
After some brain storming and intensive internet search, we had to do something about those messages in between API Gateway and SQS. We intended to wrap them somehow. Initially we took the first shot with URL encoding:
requestParameters:
integration.request.header.Content-Type: "'application/x-www-form-urlencoded'"
requestTemplates:
application/json: >
Action=SendMessage&MessageBody={"$util.urlEncode($input.json('$'))"}
passthroughBehavior: "never"
type: "aws"
This solution did work, the messages with those and other symbols were able to be passed from the API Gateway into the SQS. However, we ended up with a different problem in the hand. Some messages from clients contained special characters which were sort of obfuscated by the URL Encoding method. A big pile mess, Lambda was unable to parse the messages. We were back to square one, we have changed the message passing behavior to base64 which led to our ultimate working:
requestParameters:
integration.request.header.Content-Type: "'application/x-www-form-urlencoded'"
requestTemplates:
application/json: >
Action=SendMessage&MessageBody={"$util.base64Encode($input.json('$'))"}
passthroughBehavior: "never"
type: "aws"