Friday, October 19, 2018

Thursday, May 24, 2018

Overview of the Scrum Master role

These notes are from a training i recently took regarding being a Scrum Master. The role,  processes, and definitions are already clearly outlined and documented in a lot of places so I won't go over these; rather, I want to complement these standard resources with  the below notes I took from direct people's experiences and peppered with examples during my company training, that hopefully will be helpful and serve as a complement to the official Scrum master training and slides.

Role of a Scrum master

The Scrum master transforms individuals of a team into high performing, value-delivering teams, guiding teams to achievable plans. How? By removing impediments, championing quality, and detecting questionable commitments.
The goal of the team is to reach a transparent team velocity to enable predictability.
In practice, the Scrum master usually serves as the central point of contact within the team; for example some engineers reported they were usually too shy to talk to a different team regarding technical dependencies, and thus the Scrum master will usually take that on, for example.
However the Scrum master will not be a people manager, but only serve the team. He/She is also an active listener, and sometimes needs to convince team members on the path forward, without necessarily just push the Scrum process.

Communication

How to perform the above? By way of a good communication. The Scrum master, as well as the other people on the team, should utilize focused and active listening with other members of the team, while keeping an awareness of the whole environment. This stems from the fact that people usually tend to only focus on how to answer while a person is speaking, rather than being really attentive.
How does this work when a person is working remotely, like it's often the case? It should be mandated that they turn their video sharing camera on.


Solving a problem

The Scrum master should use open-ended questions when trying to look for a solution. "Why" questions should be avoided, as they place people in the past, on the defensive. Rather, the solution should be be looked upon the forward path. As such, powerful questioning should be used to send people in the direction of discovery. I have seen this line of questioning emphasized in other communication trainings as well as a way to defuse conflicts; i.e in managers-employees meetings, the manager/leader should support and help the subordinate by using powerful questions to explore and guide possibilities and resolutions of a problem, rather than use a more authoritative approach.
To this effect, there is a clear delineation between a mentor and a coach: A mentor is an area expert in his field; a coach, in turn, will help drive answers.


Role delineation

The Scrum master (SM) is most of the time an Engineering manager in practice. The Product Owner (PO) is usually a Product Manager or Technical program manager, and the team is often comprised of 3-4 engineers in an optimal situation. Sometimes, roles are combined, but that leads to dysfunctional teams most of the time! The PO's drive is to get as many features as possible in the product as she owns the backlog of prioritized stories, bugs, and technical debt,  whereas the SM is capacity-aware and is focused on project and time management of the stories.

In big companies, the roles delineation is usually compounded by the fact that team members are shared across different teams: i.e. UX, QA, Security teams are working across multiple ongoing projects across the organisation. Some of these resources are either directly embedded 100% of the time in a project/team for some amount of time, or just act as coaches to the team part of the time. 

Meetings/Ceremonies

There is a common complaint (including from me) that Scrum takes too much time in terms of meeting time. These meetings and ceremonies are, to recap:

-Sprint planning: to commit to the fixed scope of work. This is where the team negotiates the work to be done in the upcoming sprint, according to its capacity. Some teams have problems with this, and end up carrying over previous tasks and stories from last sprints, as they don't understand their team capacity (sometimes due to constant changes in the team).
It is really interesting to measure and compare what was initially agreed upon and committed at the beginning of a Sprint versus what was delivered at the end; to this effect, our tools take a snapshot of the initial commitment.
This meeting should be taking the total of 1 hour.

-Sprint standup: run daily, helps set the context for the coming day's work, with the standard 3 questions. The general rule of thumb is to try to be effective in running the meeting by not chit-chatting, moving bigger conversations to the "parking lot", and getting everybody's voice heard. Even though my experience is that these meetings are usually long, it can be effectively done in 15 minutes. Some teams sometimes skip this on 'no meeting week day', or do this over Slack, which is fine.
Some finer points about enabling the conversation: this should be among team members, not in a 1-1 Scrum master-team member engagement fashion. Unfortunately there is a tendency to do the latter when the Engineering manager acts as the Scrum master, and the meeting becomes a status report meeting.

-Sprint review: at the end of the sprint, the team has delivered a potentially shippable product increment. During this meeting, the Scrum team shows what they accomplished during the sprint.  The demo is not and should not be a marketing demo, it can be Typically this takes the form of a demo of the new features, usually to the key stakeholders.

-Backlog grooming and refinement meeting: this meeting is to look ahead to the next sprint and plan accordingly. Our coach said that not doing this meeting is hazardous. This should happen roughly in the middle of the ongoing sprint, to ensure the next sprint planning is on track. Every User story that is older than 6 months should be gotten rid of.

- Retrospective: this should not be a place where people complain! Instead, constructive criticism should be used to further improve the process in an incremental way, without the team's control.  This meeting should generate insights. An implementation of this is to take Slack polls on the team, in the interest of time. Retro time should also make use of the Root Cause Analysis process to understand how something went wrong and its remedy. An example of this is: This public monument is very dirty, how to take care of this situation? Why is it dirty? Pigeon poop caused it, why ? -> they come at night when no one is there, why ? No one is there; remedy: change the light schedule on the monument, which was not an obvious change to make in order to fix the situation, and is an example of not jumping to a solution immediately.


Anyway, to conclude on retrospectives: Scrum is centered about a feedback communication loop, and this is such a meeting: to refine the process. 

To close, a note about being meeting-heavy: our coach emphasized that when scrum is done well, 80% of a team member's time should be hands-on coding, and the rest of the time only in meetings. Again, Scrum emphasizes communication over process.

User stories

These are the basic units of scope for a Scrum project. It should describe the business value, and talk about Who/What/Why. The template of a User story should be: "As a <persona/type of user>, I want to <goal> so that <business value>." Of note, it is important that the organization has a fixed set list of the personas available to be consistent across the company's product offerings.

The definition of a Ready status and Done status of a User story (and its dependent items, i.e. task, sprint, etc) depends on the team; but they are usually some common threads about being clear, reviewed, testable, etc. The Acceptance criteria's will be used to determine if the story is fully implemented.
The points estimation given to evaluate the story's complexity should not be precise, but rather to give a rough idea about what the work will entail. Best practices underline stories being compared to one another and in relationship with past work in order to be pointed and measured effectively as baselines, rather than as stand alone. Playing poker, i.e. everyone on the team giving an estimate at the same time, ensures no anchoring onto the HiPPO on the team, which becomes a non democratic decision..

A story should not be given necessarily to the domain expert in residence. As good practice, Agile underlines knowledge spreading across all team members; hence the estimation should be pointed for any generic engineer taking on the task. A corollary to this is that pointing should be performed in terms of size, not time, as the work could be done by different people and precisely timing the task will generally be wrong in the first place; allowing the sizing to be coarser (T-shirt size, Fibonacci scale) will make the process actually smoother.
"Mind the product" by D. Pink is a good reference book on Product management.

Spikes

Spikes are special unit of work cases where "we don't know what we don't know". They are scheduled within the Sprint as discovery tasks, that usually take 1-2 days. The outcome is not necessarily a POC, but more of a quick learning output, which should leads into the creation of a new User story in the next sprint.

Releases

A lot of teams in our organisation draw confidence lines (optimistic|pessimistic) on the sprint release, and speak with probabilities, and let the stakeholders make the decisions on the release status.

Conclusion

The Scrum process is important as it allows to collectively and collaboratively move towards a common goal. The ultimate goal is continuous delivery, with every checkin being potentially shippable. So, rather than have a hard model of say 3 releases a year across the organisation where everyone has to align, move to a more flexible model of adaptive planning that marries well with the current architectures of today, such as micro-services.





Wednesday, May 16, 2018

Quick overview of Kubernetes


These are the notes that i took while learning more about Kubernetes.

High-level overview

Most software today runs multiple processes, and is written across distributed systems, and keeping track of all this is a challenge.  At a high level, Kubernetes helps runs your software and processes on a cluster of computers, and runs them as one entity. Kubernetes manages the processes and ensures they stay running.
Kubernetes (K8s) is inspired by Google's Borg system, and originated there. Google had already built this infrastructure, and released K8s as an open-source project.

Containers

K8s runs Docker as the primary container format (among others, less popular ones). The container gives the developer a hermetically sealed container, i.e. a box for our processes. The context for these processes is always the same, which allows the package/container to be run on different machines and always give the same result.  K8s' role is to keep track of these processes, ensures they stay up, and helps them find each other.
K8s can be run on different environments: in any major cloud provider (GCP, Azure, AWS), but also on premise or in a hybrid environment, with consistency, as K8s is open source software. So in theory there is no vendor lock-in, and K8s workloads can be moved (gradually or not) from one provider to another.

Setup

The way to set up K8s is done in a declarative way, in a config file (i.e. version of the software to run, # of instances, desired state, etc). Dial or knob for the number of processes can be changed for scaling purposes, and is just a matter of changing the config file.

Schedulers

 Scheduling in a distributed environment is running copies of an instance consistently. Scheduling ensures loading the service on a machine that's not too busy, which as a metaphor equates to playing a multi-dimensional game of Tetris with resources: this create oddly shaped combinations of disk / CPU / disk requirements that are the nodes you run your software on, and that are set up for maximum efficiency.

Rolling updates can be managed this way: the developer can roll out a new definition of her software for a container, and slowly add the new definition/version of her app, while dialing down the older version to slowly replace the previous version. So atomic upgrades are possible, as well as rollbacks, in a seamless way.
Interestingly, prior existing data infrastructures like Hadoop that tried to do it all, including managing the infrastructure now go through Docker ,  running a dockerized application on Apache Hadoop YARN.

Service discovery

However K8s is more than a scheduler as it also performs service discovery. K8s intelligently routes to services, which are often tagged (i.e. #backend, #frontend, etc) services that you can target, which is a very powerful concept.
An example of this is a load balancer which is also managed by K8s. Usually static names for the different parts of your system are given, and thus can be handled easily.

Storage

From its inception, Docker encouraged the design of stateless services.
Persistence and statefulness are an afterthought in the world of containers. This design works in favor of workload scalability and portability.
It is one of the reasons why containers are fueling cloud-native architectures, microservices, and web-scale deployments.
So, given that either the host can abruptly terminate, or the container itself can fail, the state needs to be stored usually somewhere else via a networked volume independent of the host or the container.

A pod is the logical unit of Deployment in Kubernetes. 
A K8s volume is attached to the pod that encloses it. Data in the volume is preserved across container restarts. If the pod dies, the volume is gone. The K8s volume is a directory with some data that is accessible to all the containers of a pod.

Google Container Engine


GCE is the hosted version of K8s managed by Google.  Thus K8s is being upgraded, and the cluster handled for you, for example. What you get out of a hosted environment is:
- Dynamic creation and removal of machines
- APIs for controlling the cluster and the network.

Autoscaling is a major feature of GCP (and is also offered in other providers).