Posts

Managing Microservices Timeouts

Image
 Context: Services request timeout is usually something that is less scrutinized over when designing a solution. Which is rightly so if we are considering a simple solutons (1-2 daisy chained microservices), and/or monolith ones. However that changes when we start looking at 5+ daisy chained microservices, as the diagram below illustrate it Fig 1. All microservices have the same timeouts Now one could argue that we could obtain baseline of how long on average Microservice 5 will take to process the request and generate a response and configure each of the microservices accordingly, such as follow: Fig 2. Each microservices slightly higher timeouts than downstream Which indeed would work well under a straight forward, theoritical happy case scenario. However most experienced SREs would agree that what matter is not the happy case scenario, but how the system handle all sort of stress loads. Even if we can come up with the golden number for the baseline of Microservice 5, this approa...

API Versioning with Document DB

Image
Context You have a document database, let's say DynamoDB. It has about 10m of person's name data, for example:  The data is accessible through a highly scalable API and being consumed by 100+ other services. Over the course of few years there are some significant changes that require several iterations of the data and API, but all versions have to be maintained due to various legacy clients. The iterations are summarised as follow: V2: Suffixes are now mandatory V3: Title is now forbidden V4: Givenname to change to Firstname V5: Tony Stark Jr is now the supreme leader of earth, all Stark lastname must be capitalised Forces You only have one Document DB All versions must be maintained Solution V2: Suffixes are now mandatory Adding a new field is quite straight forward, we simply add to the Document DB V1: leave it unchanged or update it to make sure that it does not return Suffix V3: Title is now forbidden Removing a field is trickier, since we have to maintain backward compatib...

Investment Loan: PI vs IO

Image
Everyone who has delved into investment property would have heard the mantra: "For investment property, do Interest Only loan instead of Principal and Interest loan to maximize your deduction". I did, and I followed that mantra blindly until one day it dawned upon me that I have never checked the math at all. So one night I checked the math, and here are the results: Assumption 1: 100% LVR for easier calculation, the LVR does not affect the final nett income difference between PI and IO and we are only focusing on this difference. Assumption 2: Both PI and IO have the same interest rate, in reality, PI usually have lesser interest rate which would further increase the final nett income Assumption 3: Tax bracket set at the 30% level, this should be translated as: every dolar of income from the investment property falls under the 30% tax bracket Assumption 4: no capital gain for the span of 10 years. Capital gain is ignored since it doesn't affect the final nett income d...

Spring Boot Reactive API Part 2

Image
 Spring Boot Reactive: Improving CPU bound performance using Scheduler In part 1 , we have seen that Spring Boot Reactive doesn't really improve performance if the WAITING time is CPU bound / CPU intensive tasks. This article show a quick way on how we can improve the speed through specific configuration Setup Spring Boot's embedded Tomcat is reconfigured to only use 10 worker threads, this way we can clearly see the performance limitation and tweaks. Gatling is configured to simulate 30 concurrent requests with 30000 milliseconds request timeout. APIs internal process is calling java native code that calculate the value of PI over 1 million iterations Default / No scheduler The main reason why CPU bound tasks do not improve (or even degrade) with high parallelism is because of the costly overhead for context switching in the low level. In the scenario below, 10 main threads are all being used to calculate PI values and because there are less CPU than threads, a lot of context...

Spring Boot Reactive API

Image
 Spring Boot 2: Using Spring WebFlux & Tomcat for Reactive APIs Spring WebFlux is a somewhat new (since 2019) library from Spring Boot 2 that provides massive improvement in performance compared to the traditional synchronous calls. To quote the Spring documentation itself: Why was Spring WebFlux created? Part of the answer is the need for a non-blocking web stack to handle concurrency with a small number of threads and scale with fewer hardware resources. Servlet 3.1 did provide an API for non-blocking I/O. However, using it leads away from the rest of the Servlet API, where contracts are synchronous ( Filter ,  Servlet ) or blocking ( getParameter ,  getPart ). This was the motivation for a new common API to serve as a foundation across any non-blocking runtime. That is important because of servers (such as Netty) that are well-established in the async, non-blocking space. Further reading:  https://docs.spring.io/spring-framework/docs/current/reference/html/web...

Inner Joined Data Breaches

Image
Optus Breach and Medibank Breach: A Breach Made in Hell  Problem: In the past few weeks there have been a whirlwind of data breach news from  Optus and then  Medibank . Anyone in Australia knows that these are two gigantic service providers in the country serving a huge portion of the population. Therefore the data stolen are horrendously massive, and when combined, extremely deadly. Let's have a look at Optus vs Medibank data stolen according to news:  As we can see, either data from Optus breach or Medibank breach alone provides ONE PRIMARY ID which is sufficient for a hacker to: Register a new phone number Open Buy Now Pay Later account Open crypto account in Centralized Exchange etc (not going to list all of them for security reason) To make things even worse, if someone got their hands on both of them and join them together: Basically they will produce a complete identity of a person with TWO PRIMARY ID  documents. With those documents, a ha...

Transactional Outbox Pattern

Image
Transactional Outbox Pattern, Why? (part 1) Background In microservice and event driven architecture, transactional outbox pattern is crucial to maintain the state of data (or "aggregate" as per Domain Driven Design) in two or more persistent storages, especially when 2 Phase Commit is not available nor desired. You can read an excellent explanation of the pattern here . This article will be divided into two parts, part 1 will discuss why we need it, part 2 will discuss how we can implement it. In my experience designing solutions with transaction outbox pattern, I have come across multiple similar questions challenging why do we have to do it in such a complex way, and why can't we just reverse the order, use try catch clause, database transaction, etc. Based on those frequent conversations, here is my attempt to explain why we can't solve the problem with ordering, try catch clause or transactions. Suppose we have a system where we create a person record, which will...