Web Apps: Design & architecture tips
I previously (my take on solution design) mentioned about sharing a few tips on application design & architecture. Here is a post with a few raw design pointers. Some I learned from my own experience and some by looking at others implementation. I will update this post with more and more tips as and when I find time and will also make an attempt to categorize them under different architectural pillars.
Here we go…
The biggest sources of failure in a system are usually people and not the machines or networks. Have fewer components in the system which involves human intervention.
Automation works well with homogeneous and modular systems. Look for opportunities in your current and the new systems you design.
Whatever is a pain to humans should be automated.
If scaling out is an easy & affordable option, choose it over scaling up. Scaling out does more than scaling. If you need to scale a little, go vertical.
Avoid cascading failures unless you absolutely intend it. If something fails, it should be okay and should be treated as a norm. Not everything should fail to depend on it. Synchronous operations are the usual reason for cascading failures. If your system has more and more async operations then, if something fails, it fails in isolation and does not fail together.
As experts say, treat your servers like cattle and not like pets. If one of them isn’t doing well, kill it and spin another. Don’t be in love with a server. No emotions.
Write probes that Periodically checks security, health & load (CPU/memory/application/databases) and to kill unhealthy/non-adherent (to best practices) machines and spin new ones.
Destruction: Write programs that randomly kill your servers and services. This is to ensure periodically that your system works with failures. Don’t wait for your servers/services to fail naturally with an assumption that your backup servers will be live on a disaster. Better fail it yourself occasionally to see if your recovery mechanism is working intact. Netflix does this.
The overall complexity of a system is not just the sum of complexities of independent components. It will be much more. So, keep things sim...
You can write programs which check your systems for resource wastage and the program can optimize the resources if needed. Not using something but still paying is foolishness.
Stop guessing the capacity. Design your system to scale up/down automatically based on the load. Enable auto-scaling when required. One way people do is to bake the code and keep it in a place (like a storage bucket or a repo). During the process of auto instance creation, the image on boot can pull the code from a bucket and add machines to service.
Regularly backup your data. OK, you know this. Even more importantly, test your backups regularly. Many times backups will be present but don’t get restored when you need. Bad format, bad media or corruption = loss.
Your recovery process should just be a part of regular processes you follow. It should not be treated as a special exercise.
Define and establish a path for support and escalation process to handle things when a system goes wrong.
Server-less architectures remove the need for you to run and maintain servers to carry out traditional compute activities. (An s3 bucket storage can act as a website. You may not need a web-server. Many times, things like this remove the operation burden.)
Excess use of applications/services from a particular vendor is a risk. You get tied up. The number of versions of your application probably would depend on the number of times your cloud provider upgrades his service and forces you to redo. Think twice before you decide to get locked-in. The point is to make an informed decision. You might not want to rely completely on a vendor in order for your whole business/org to function. Cloud adoption is going to grow for sure and not shrink.
Port, firewall, and database misconfigurations are the usual causes for security breaches. Write a scanner and periodically scan for potential problems. Lock down everything and then open what you just need. (Not the other way for sure)
Secure your system at all levels & layers and not just at edges. Use encryption whenever and wherever data is sensitive. Multiple levels of defense are always better than a huge monolith door.
Decide what you keep in-house. Moving everything to the cloud may not be the best way in terms of cost, security or feasibility.
Stress test your systems often. Initially, your system might work very well but over a period of time, many things get added and chances of breaking become more.
Log and audit all the actions (from programming & operations point). Storing data is less expensive today. Traces help a lot.