Thursday, May 19, 2016

Immutable pointers - a pattern for modular design to help achieve microservices

Modular design has a lot of benefits, including:
  • making it easier to predict the impacts from a change
  • helping developers work in parallel
But, it is much much much harder than people think.  For a start, the abstractions have to be at the right level.  Too high and they can become meaningless and too low and they cease to be abstractions, as they will end up having way too many dependencies (efferent and afferent)

In a recent green field project, which was in the area of digital commerce, I was intent on achieving good modular design in the back end for the reasons outlined above and because the project had potential to grow into a platform (more than likely microservices) that would be used by multiple teams. To achieve the modular design, after some white boarding the back end was split up into a bunch of conceptual components to achieve shopping functionality.  The core components are listed below.


  • Shopping component - Core shopping functionality e.g. Shopping Cart management
  • User component - user management, names, addresses etc
  • Purchases - details about previous transactions
  • Merchant - Gateway to merchant API to get information about merchant products etc. 
  • Favourites - Similar to an Amazon wishlist
Now, every software project uses the word "component" differently. In this project, a component was strictly defined as something that did something useful and contained:
  • domain classes 
  • services
  • a database schema  (could be its own database, but the idea was to isolate it from any other components persistence) 
  • its own configuration
  • its own dedicated tests
  • its own exception codes domain 
The outside world could access a component via a bunch of ReSTful endpoints. 

Any component could be individually packaged, deployed etc.  Now, the astute out there will 
be thinking,  "this sounds like microservices", well they almost were.  For this project, 
some of them were co-located but they were architected so that deploying them out into 
individual  deployed artefacts (and hence a microservices approach would be easy). 

Ok, to reiterate, the goal was achieve a very clean modular design.  This meant, that I didn't want 
any dependencies from one component's database scheme to another and for this blog post we are only going to focus on how that aspect of the modularity was achieved. 

Now, looking at the above components, it doesn't take long to see that isn't going to be so easy.  For example: 
  • A shopping cart (in the Shopping component) will have a reference to a User (in the User component)
  • A shopping cart item (Shopping component) will have a reference to a Product (Merchant component)
  • A Shopping cart (Shopping component) will have a reference to a shipping Address (User component

So the challenge of achieving modularity in the persistence tier should now be becoming more clear. References of some sort across components need to be persisted.  Immediately, any developer will ask, 
"Wait a sec, if we just use foreign keys for these inter schema references we get ACID and referential 
integrity for free!"  True. But then you are loosing modularity and introducing coupling.  Say you want to move your products (in the Merchant Component) away from a relational database and use something like Elastic or Mongo DB instead - to leverage their searching capabilities.  That foreign key isn't so useful now is it?  

Ok, so first of all in looking for a solution here, I thought about all the references that were across 
components to see if there was anything in common with them.  One thing that was obvious was that 
they were generally all immutable in nature.  For example:
  • When a Cart Item (Shopping component) points to a Product (Merchant component) it points to that product only.  It never changes to point to another product.  
  • When a Shopping Cart (Shopping component) points to a User (User component), it is also immutable.  My shopping cart is always mine, it never changes to be someone else's.
So I was now starting to think about preferences:
  1. Avoid cross component dependencies if you can (this should be kinda obvious)
  2. If you have to have them, strive for immutable references.
So, next up was to have a name for this type of relationship - which I was calling "Immutable pointer" in design and architecture documents. But, for actual code I needed something more succint. The database schema was already using "id" for primary keys and "{name_other_relationship}_id" for foreign keys. So I decided all cross component relationships were named as the the name of the entity being pointed to and "ref".  

So, some concrete examples:
  • userRef   (ShoppingCart pointing to the user)
  • productRef (CartItem pointing to the product)
  • shippingAddressRef (ShoppingCart pointing to the ShippingAddress)
This meant anytime anyone saw something like "xyzRef" in code, schema or logfiles they knew it was a cross component reference. In case it wasn't obvious, Ref was an abbreviation for Reference.

Next up was to decide on the format for the actual Refs.  I took a bit of inspirational from that thing call the internet, which of course has similar concepts where abstractions: web sites of web pages contain immutable pointers to other abstractions: hyperlinks to web pages in other web sites.


So similar to hyperlinks and URLs, the refs would follow a hierarchical name spacing format.
Some good team input from senior technical member suggested to continue the inspiration from the Web and start the hierarchical names with cp://.  CP for Commerce Platform the name of the project. This was analogous to http://  I thought this was good idea as it indicates that our platform generated the reference.  Again, this meant they stood out in logfiles etc and could be differentiated from any downstream components also using hierarchical type references but in a different context. 

The key point of the ref was that when generated it should of course be unique.  To achieve this 
a mixture of database primary keys or something unique about the data (e.g. product skus were used)

So some examples: 
  • userRef -> cp://user/{uuid of user}
  • productRef -> cp://merchant/{uuid of merchant}/product/{sku of product}
  • cardRef -> cp://user/{uuid of user}/card/{uuid of card}

Always immutable?

As stated, the first preference was to avoid the cross component reference. The second preference
was to use the immutable pointer (ref pattern). However, what about an edge case where the cross component reference could be mutable.  Could this happen?  Well it could.  Easiest to explain by an example.   

Every shopping cart doesn't just have a User, it also has a selected shipping address where the contents purchased will be shipped to. In the domain model, the user's address lived in the User component.  But unlike the other cross component references the shipping address could change. Consider your Amazon shopping cart.  Imagine you are on the checkout screen with your selected card and your selected address, but before you proceed to checkout you go into your user preferences and delete your card and address.  This project had to facilitate similar scenarios.  So the User deletes the address that a shopping card id pointing to what should happen?

So for inspiration for solution here, we can look to the various NoSQL patterns and in particular 
one of the most popular is eventually consistency. What this says is that unlike ACID you don't 
always need consistency straight away, all the time.  In certain cases it is okay to allow inconsistency
on the basis that the system is able to reconcile itself. 

So in this case:
  1. The shopping cart is pointing to a specific shopping address using an addressRef. 
  2. The user deletes that address by hitting a ReST endpoint in the user component.
  3. This means the shopping cart will point to an address that doesn't exist. The system is inconsistent.
  4. The next time the user reads the shopping cart, in the request handling the shopping component asks the user component if the address with this address ref sill exists and if it doesn't removes the pointer.  
  5. This system is now consistent
So with the architect hat on it is really important we get this all right. Otherwise the goal 
of modular design falls apart. 

In this case, it is worth reiterating the strategy one more time:
  1. Avoid cross component references.  Doesn't matter how great your pattern is, if you have a lot of cross component references,  it is more than likely you have got the component abstractions at too low a level.
  2. Favour immutability.  In general immutability means less code paths, less edge cases, less complexity in code. 
  3. Eventual consistency.  
Software Architecture is about trade offs, and finding the right balance.  In this case, it was the balance between clean modular design but not going over board with it so that it become impossible to achieve. 

For anyone trying to do microservices, I would strongly recommend trying to master how you would do modular design in your architecture first.  If you can't do this, when you add in the complexity of the network things get very complicated. 

  

No comments:

Post a Comment