Abstract
A large number of IT organizations today are monodisciplinary. This detracts from their ability to provide well-crafted products and use best-practices that exist from other disciplines. Wheels need to be reinvented and messy workarounds seem to trudge the IT organisation along.
There is a historical reason that IT organisations have ended up this way; departmental fiefdoms, communication issues and bureaucratic red tape.
The solution to this would be to bring in experts from other disciplines and set a framework that highlights competency, simplicity and transparency to integrate all the expertise and produce high quality products.
Inspired by the philosophy of John Ruskin and the Guild of St. George.
1. Introduction
My name is Jonathan. I have been working for 11 years, trying to improve the performance of systems that use databases. Through that experience (and with observing leading people in my industry), I have developed a knack for viewing everything as a system and then identifying bottlenecks within that system.
As of the middle of last year, I have started to use this knack and apply it to human systems at work. I have also studied intensively some concepts from: psychology, philosophy, political theory, social systems, economics and business strategy.
After noticing some short comings that began to increasingly frustrate me at work and in the spirit of 'don't just complain, try to fix it', I have come up with a system of organising work in IT organisations that I have given a lot of thought to.
I plan in this post (or white paper) to explain some shortcoming with our current way of working in IT and a possible future or improvement to those systems.
2. In the Beginning
IT organisations or the IT department within organisations, typically used to look like the diagram above. You would have Developers, QA, Database Administrators, System Administrators and Network Administrators. Some companies still have this same structure with slightly different divisions.
Over time, problems with this structure emerged. The main one that I would say is that over-time, the objectives of the different teams diverged from that of the overall company to that of the priorities of the team. Meaning, they became fiefdoms or tribes and started warring with each other.
Not physically warring with each other. More like a sort of
- Territorial protectionism: "This falls into our areas and we will decide whether to do it or not"
- Resource allocation: "Team X needs us to do Y. It will take a lot of work and I can't be bothered with it now. I'll just tell them to write me a ticket and I'll put it in the backlog for a while"
- Communication process creep: "I know that the ticket was sent 2 months ago, but I have not received the detailed documentation of what to do, nor do I have written authorisation from manager X and head of Y"
If you look at the above chart as a hierarchy or a social system, it would look like Feudalism.
2.1 Story: The Consultant
A Java consultant once joined a company for a 6 month contract with a similar Feudalistic structure. He asked the DBA team to give him an Oracle dev database so that he can develop what was asked of him. He wrote up a ticket and waited. After a while of not getting the database, he continued with other things and tried to compensate with what he had available. There was some back and forth between the heads of his department and he did mention the lack of a dev database in meetings.
However, the contract finished at the end of 6 months and he left the company. 1 month later, he received an email that his Oracle dev database was ready for him to use.
3. Rise of the Developers
Around the beginning of the first dot-com boom, small start ups became quite popular. In those start-ups, it was expected that developers, set up the entire system - what we call full stack developers, today. As those companies succeeded and grew, some chose not to split off responsibilities to the format of feudalist model, but instead decided to add more multi-skilled developers.
This produced the following and arguably the current model for small to mid-sized companies:
Now what you have is what I call a developer-centric IT company and if I were to pick a hierarchical structure for it, I would say Monarchy.
There are two phenomena that I can see that got us here: job compression and automation.
3.1 Job Compression
Job compression means that a company decided to restructure its processes to have fewer stages which reduces the need for wait time between stages.
The example above shows a mortgage approval process. There are 4 stages. Each stage is a person with different expertise and different authority. Between each stage, there 'work request' sits in that person's inbox until they can get to it. The combined processing time and queuing time is 18 days.
Job compression would give 1 person enough authority and expertise to make a decision on the approval process.
You have now reduced the time it takes to approve a mortgage from 18 days to 7 days. Note that this was largely accomplished by reducing the overall queue time.
3.2 Automation
As more developers needed to take care of more areas of expertise, they did so by using certain developer philosophies to solve problems and in this case used automation. This brought about certain innovations like Puppet, Chef and Ansible along side previous SysAdmin innovations like virtualisation and later, cloud computing.
You can now, using code, boot up a container of a web server with the all files, scripts and images and run a slew of black box tests against it to see if it fully works.
Accordingly, developers now take on several roles in the IT organisation:
- Development
- Business Analysis
- Quality Assurance
- Database Administration
- System Administration (now DevOps)
- Security
- Data Engineering
However, it is difficult to hold all that information inside one's head and developers are using these automations as a crutch to progress with their original work. For example, you can download a few Puppet modules and install as well as begin monitoring a new high availability database. Unfortunately, you have now lost the expertise (in the company) of what is going on under the hood and how to fix issues when they occur.
Very few innovations have been made in the areas outside the realm of pure developing as there are fewer experts in companies to make those innovations.
For example, while we have automated processes for storing and managing database schema changes, we have not had any innovations with deploying dev/test/staging databases that contain actual data to test against. Nor can we use existing automated systems for managing schema changes when our production databases become too big.
There is a general 'uneasy' feeling when needing to make changes to systems we don't fully understand. This negates the 'safe to fail' environments which we use today to make innovations. We also tend to apply 'philosophies' that work in one area and to another. This is sometimes helpful, but other times detrimental.
3.3 Story: API vs Batch Process
I was involved in a data batching process that roughly required 200 million items to be processed through an existing API. Had that process gone through the usual way, it would have taken 64 days, with the average chance of crashing.
The idea to improve this process was to add more web servers and parallel the work into as many threads as possible. This is a common philosophy that developers have picked up due to limitations with the speed of cores on CPUs. As core speeds have not improved in 7 years, the only option to improve performance would be to split the work across a number of threads.
I identified that API spent the majority of its time making database calls and that ultimately, the bottleneck would be the hard disk IO and certain mutexes.
I recommended offloading part of the work to the database. This involved loading 200 million items to a temporary table in the database that took 7.5 minutes, using a single thread. The rest of the work still needed to go through the API and took 8 hours to complete. Had the whole process been applied against the database in an efficient manner, I would assume it would take up to 45 mins.
3.4 Story: Spread Out vs Push Down
A company had a batch process that took around 2 hours and had a detrimental effect on the website during that time. I configured the database to handle such loads better and brought the time down to 30 mins using 6 application servers. I rewrote the batch process to be more 'database friendly' (push down work to the database) and reduced the time down to 3 minutes and 1 application server.
4. Competency, Simplicity and Transparency - Pattern
So far, we have had a feudalistic hierarchy with issues with warring fiefdoms and fighting over company resourced. We had then given all the resources to one entity - monarchy, but we lost expertise and reduced innovation in certain areas.
How can we leverage more advanced governing systems like democracy and capitalism?
How can we move to an organisational environment where more individualism is valued and where people are able to thrive and do better work?
4.1 Competency
Skill is the unified force of experience, intellect and passion in their operation.
- John Ruskin
One element of Capitalism, is about accepting Pareto’s principle about how expertise is distributed in a population in one type of hierarchy. Instead of going against it (socialism), it is designed to create new hierarchies, more areas of expertise, to have more people at the top of different hierarchies.
This lends towards the idea of craftsmanship as well.
What could happen in the future is that IT companies can structure their teams based on competency-based hierarchies. Meaning, areas of specific expertise and philosophies which are exlusive to one particular domain, thus maximising results for the whole IT company.
Another benefit from expertise and craftsmanship can be found in economics. Economies of Scope is a term from the world of business. You have probably heard of Economies of Scale, where you have a few products and you try to have bigger factories and bigger machines to pump out the same product in large quantities which would mean cheaper costs.
For example, you can have a factory that makes 3 types of sandwiches. You purchase bigger machines and improve your processes as much as possible to make those 3 sandwiches as fast as possible and remove all possible waste.
Economies of Scope, on the other hand, is a system where you try to produce different and varied products at a cheaper price. For example, take Subway. You can go in one and produce a high variety of sandwiches at slightly higher price than if you would buy a prepackaged sandwich in a shop.
The idea with Economies of Scope is to break down the process of creating new products into sub-processes that have a very defined scope and then set up communication systems to co-ordinate between those defined processes as well as have some synergy between them.
4.2 Simplicity
A complex system is difficult to work with. It is also difficult to work in a mess. Now complexity doesn't exactly equal a mess, but both of them are not an ordered and organised system. So (complexity or mess) is Chaos and not Order, in this context.
Art is not a study of positive reality, it is the seeking for ideal truth.
- John Ruskin
Once your system is simple - not a mess, not complicated and not complex - it has a 'clean' and 'this just looks right' feeling to it. This might be called the aesthetics of simplicity.
Similar to 'clean code' and 'clean architecture' this philosophy of aesthetics has an innate feeling in it that something is beautiful and right.
The Tea Room |
I would like to include diagrams to this aesthetic. Systems diagrams, network diagrams, database diagrams, business logic/rules diagrams - these need to be included in the art of 'clean and simple'.
When those objectives are reached, the systems, network, databases and business logic/rules may also be clean and simple - to understand, use, operate and make changes to. Please give it a try and see if it instinctively makes sense to you.
4.3 Transparency
To see clearly is poetry, prophecy and religion all in one.
- John Ruskin
Transparency is ultimately, the best way to prevent fiefdoms from occurring. Fiefdoms usually silo and represent information to other parts of the company to benefit itself.
For example, lets say an unethical manager would like a talented individual to stay in their division. That manager can simply not promote that individual and even give negative reviews to keep them where they are.
If, however, HR had access to objective metrics about all the employees, they could see that that person produced good work and has been in there position for some years. They would promote that person before they move to another company.
Some metrics that help can be included in Transparency:
- Time until first 100 lines of code (gitprime.com)
- Complexity rating of class (PMD)
- 95% API response time
- Average time for SEV2 tickets resolution
- Orders per week
- Website feature usage (clicks) per week
- Usefulness of App feature - survey
5. New Roles
This framework has a definition for an old role: Managers and a new role which I felt should be included that I call: Technical/Business Analyst. Both are very important for the framework, so I will explain them now.
5.1 Technical Business Analyst
Business Analysts seem to be something that only large companies have and there has been some huge innovation in documenting and expressing business knowledge in the last 5 years. We all need to start using this skill set to explain and diagram requirements and business knowledge, no matter the company size.
Business Process Modelling Notation 2.0 and Decision Modelling Notation could well be the next innovation in bridging the dialog between business and IT.
5.1.1 Story: Requirements Diagram
I was trying out using decision tables to document requirements. I talked with the Product Manager and asked her to give it a try. She took a ticket that a developer quoted as taking 5-8 days to implement. She went over the requirements and built a decision table in excel. She then showed it to the original developer, who said: "If this is all that is required, then it should take 1-2 days to implement".
5.1.2 Story: Pyramid of Doom
I was working on a way to document technical processes. I went over some code and found an if-then-else "pyramid of doom" in it. I then tried to put the conditions from the code into a decision table. After I was finished, I showed it to the original developer and he instantly understood it and made a correction to the table. I then proceeded to tell the business analysts in the company that were extremely impressed that that developer understood it so quickly. Apparently, they have had difficulties communicating business requirements to him before.
In the old way BPMN 1.0, mapping a process would look something like this:
I am sure, everyone has ran into something like this glued to a wall in an office. It's not very clear what is going on.
What happens in BPMN 2.0 and DMN, is as follows:
Decision Table - Discount Decision |
And then, the process mapping is simplified:
BPMN 2.0 - Notice the small square/hash icon in the discount decision |
- The business logic is captured in an easy to understand way for the business user (notice, its in Excel)
- That same decision table is understood by the developer
- The process mapping is now easy to understand and therefore easier to understand more parts of the system.
We've gone over the business side, but we can go a bit further and apply this same process mapping to the technical side:
DMN for a Technical Process |
DMN for a Technical Process |
Technical Business Analyst should be the ones to go over both and create both of these types of diagrams and tables. This should achieve a couple of things:
- Provide a counter-balance and due diligence to new business requirements: "I understand you would like this new feature. Could you please explain to me in detail what it is that you need?"
- Reduce the time groups of developers spend next to whiteboards.
- Reduce risk by using decision tables to notice scenarios that were not considered: "We have Active for CustomerStatus, but I don't see a scenario where the OrderStatus is suspended."
- Reduce the meetings between developers and business users.
- Reduce the scope that developers need to work on and increase focus on a specific task.
- Create a system of business and technical documentation.
TBAs should spend time going over the backlog of tickets. This should increase the velocity of the team if the tickets are very well defined.
When a new ticket is taken on by the team, a developer and a QA engineer should pick up the same ticket: The QA should start writing functionality tests based on the scenarios in the decision table and the developer should write the code and test it against those tests.
This role should cover the following points from 'Boehm's Top 10 Software Defect Reduction list':
- Finding and fixing a software problem after delivery is often 100 times more expensive than finding and fixing it during the requirements and design phase
- Current software projects spend about 40 to 50 percent of their effort on avoidable rework.
- About 80 percent of avoidable rework comes from 20 percent of the defects
In addition, this role should also prevent or at least greatly reduce cancelled projects or priority changes. I understand that these are extremely demoralising for developers.
Let us finish up by going over the framework values with this role:
- Competency: This is a new role for most small-to-medium companies. It should streamline the development process by adding an expert into the right area and reducing the scope of work for other people in the company.
- Simplification: Having easy to understand diagrams and documentation simplifies development work. TBAs should also identify parts of the system that could be simplified (value stream mapping) and suggest very specific and narrow work for technical debt.
- Transparency: TBAs should make the whole system easy to understand for both IT and business users, outside of it.
5.2 Managers
I would like to start off with saying that managers do not equal team leaders. In the developer-centric companies, there are very few managers and there are mainly team leaders: developers that have been promoted to lead other developers.
Dilbert.com |
It is no secret that people do not like managers that have no idea about their technical role. In addition, there was a study that determined that 65% of managers actually produced negative value for the company. On the other side, good managers produce huge value (Pareto Principle) for the company and it should not be something we write-off.
Currently, with the lack of managers in IT companies, there is a reliance on hiring someone who 'is the right fit' and are basically outsourcing the need to manage to the individual. If they don't work well, then there is something wrong with them.
In the context of a Capitalistic/Democracy, what role would managers play?
Well, in a Democracy, there is a need for Law-makers to make systems for people to interact in a helpful way to society. There is also a need for Courts for dispute resolution.
Managers should think of systems inside the company that promote honesty, tolerance and freedom of speech. Managers should also resolve disputes in the company and look for workplace complications before they become a full blown warring tribe. Bear in mind, that this framework encourages experts and experts usually have opinions.
Following the values of the framework, lets go over what a manager should do:
- Competency: The manager should be competent enough at coming up with social systems that are effective for that specific company culture. The idea is that the cogs turn smoothly.
- Simplification: The manager should set out rules in those systems, but set out very few rules and then enforce them. With regards to communication, less is more. The manager should make sure that a group can handle things in their own expertise and scope and try to reduce communication dependancies.
- Transparency: The manager should implement metrics gathering to both know how the IT company is performing, but also be transparent to stakeholder outside IT and build trust with them.
6. Applying the Pattern
Let's take three measures of the output of a system to see how these philosophies could work: Speed, Control and Quality.
6.1 Speed
- Competency: If we have experts, then we can make the best choices to build the products instead of trying out many choices until we reach the right one.
- Simplification: If we simplify the system as much as we can, we can both integrate new systems faster as well as produce easy to use systems. In a lot of ways, simplifying equals business agility as it helps you change the business faster to meet the needs of the marketplace.
- Transparency: If we have metrics that show us were bottleneck are in the system, we can make those systems as fast as possible.
6.2 Control
- Competency: If we have a high degree competency for a defined scope and area, then we have a high degree of control over the system.
- Simplification: If the system is simplified, it is easy to use it.
- Transparency: If the movement of work is transparent, we can see monitor the time it takes to exchange communication and complete work in the system. Another way of looking at it is that one cog is moving slower and is slowing the system down. Ultimately, this is where a manager would need to step in.
6.3 Quality
- Competency: If we have craftsmen, the cogs they produce are of high quality.
- Simplification: If the products we deliver have been simplified, it provides an easy to use product for the customer (perceived quality).
- Transparency: If we have metrics to see how popular the new product is and how it is used, we can improve the quality of that product. Ultimately, this will need direction from 'the business' and would require interaction with Technical Business Analysts (TBAs in the diagram).
Quality is never an accident. It is always the result of intelligent effort.
- John Ruskin
7. F.A.Qs
- Is this system a replacement for Agile?
- No, its completely complementary to it and would probably better serve the principle of having 'multi disciplinary teams'.
- How do you prioritise or expedite work in this system?
- That would be up to the manager. Technically, if you would like the option of expediting, you would need to leave some spare capacity in the teams.
- What if there is not enough skill in house?
- If you don't have the skills you need in the company, then consider bringing in an outside consultant - even if its for a few days. You will not gain new innovations, but you will gain from other company's experience.
- What would happen there isn't enough work to justify a new field?
- It could be very possible to let one person in the company have a dual-role and still have time to try and innovate in this new field.
- How can I split up an area of expertise without it leading to a huge overhead of communication?
- That would really depend on you and your needs. You need to find a balance of 'less is more' with regards to communication, but also have enough work concentrated in front of an expert for them to recognise patterns and generate innovation.
No comments:
Post a Comment