If anyone heard the word, Software engineer or programmer, he/she would probably correlate this with a person who codes a lot and spend a lot of time programming and writing code but actually, This is not true, Software engineers especially seniors ones, spend a minor part of their time coding, they spend the majority of their time researching and designing ambiguous business problems which can not be coded directly as trivial ones, writing documents, managing junior developers and comparing between different architectures and which one is more suitable for the project onhand.
As a student with little work experience, I focused in my freshman on two main things, doing side projects(writing code & learning technologies) and solving algorithmic problems, Which is a good strategy but if you want to grow as a software engineer, you have to add more tools under your belt.
I noticed that to write good code, you have to care about different topics like compilers, architectures, design pattern, etc… and to start connecting the dots, For example, it might be counter-intuitive for a computer science student or software developer to see the correlation between learning a low-level concept such as compilers and writing more robust and cleaner code, but if we dug deeper, We will find that if we understand, for example, how parser and lexicons work in compilers, we will be more confident and faster in debugging code as we already have knowledge of how compilers read code and parse it in a parser tree and what kind of errors displayed when a certain operation is performed wrong also learning about topics like scalability and architecture will promote us to write a cleaner and better code after seeing the big picture, beginners(including me) focus on technologies and languages and they do not pay attention to important concepts and topics like data structures, algorithms and databases, software engineering practices, compilers, theory of computation and the list goes on, to put it simply, these topics is important for you as a beginner in order to make sense of the bigger picture so this a reminder for me and everybody else, to focus on the important things first and then everything else will come naturally. also a disclaimer, this a personal opinion, so feel free to ignore it if you want.
We will talk about interesting topics, a lot of software engineers, especially in big companies, has to deal with problems related to it.
Scalability can be defined as how software or system cope with an increase in its load and how it deals with the growth in this load in a healthy way that minimizes downtime and errors.
Scalability might be relevant or not based on your goals, A start-up in the funding stage should not worry about how their MVP can scale to handle five million, for example, they should focus on more important topics like friendly user interface, logical features that serve the need of the customer, easy navigation, etc…
While big tech companies like Facebook, Twitter, Reddit, youtube care about scale, and for them it a challenging task to scale millions of users without degrading the user experience.
We measure scalability using metrics known as load parameters, Load parameters are how you’re going to stress test or assess how your software cope with the increase in load, for example, load parameters can be the number of requests per second or the number of writes and reads in a database or hit ratio on a cache or amount of data users upload, all these examples are valid load parameters, so load parameters is a measurement unit to measure the performance of how software scale with an increase in it.
We have two options when it comes to scaling.
Scaling out means increase the number of machines or instance on which the software runs(servers) or perform operations, Instead of having one powerful computer, We should have multiple instances of computers and allocate the load or requests across them, We can dynamically allocate the load(traffic, requests) between machines using a load balancer.
Scaling up means increase the computational power and storage of the machine instead of having multiple machines, We can increase the machine specifications.
There’s no optimum option, We can not argue that scaling up is the best option, It depends on the problem and solution, and many factors I am not fully aware of but we can argue that using a mix between both options is a valid solution and used by different companies.
I will love to end this article with a study case, the case is taken from designing data-intensive book by Martin Kleppmann, In 2012 Twitter had an estimate of 340k request/ second from users who want to view their timeline(Home) and around 3.6k request/ second users post a tweet, the latter number is not that big, it is a straight forward operation, we will write a 3.6k request in the database, The tricky one is reading feed and displaying it to the user on his/her timeline, In the beginning, Twitter adapted a normal approach where when a user requests to view his/her timeline, we will query the followers he/she follows and bring their latest posts and rank them by time, this can be expressed in SQL as
Another option we have is caching, Attaching a cache to every user, Whenever one of the users he/she follows post a tweet, It will be added to the user’s cache.
Comparing the two approaches, The caching approach is more prominent in terms of complexity and memory, Because we do not have to join the other two tables(users and follows), All we have to do in the caching approach is to look at the user's cache and read it, but the caching can be challenging if the number of followers increased substantially, Celebrities with 30,000,000 followers, We have to perform 30,000,000 writes, which is un-feasible and in-efficient, querying, in this case, is more favorable, Twitter adapted a hyper-approach, where caching is used for users with a normal number of followers while celebrities with millions of followers use the query approach.