This is a set of knowledge for leading engineers in an organization, as evidence driven as possible. This can never encompass everything but can aspire to be a useful place to start.
- Constructive feedback and contributions are always welcome.
- As a Leader
- Growth
- Focus
- Prioritization
- Coaching
- 1on1s
- Feedback
- Training
- Motivation
- Performance
- Hiring
- Interviewing
- Combating Bias and Diversity Strategies
- On who to hire
- Teams and Meetings
- How to stay on top of technical developments?
- How to evaluate performance?
- Technical Debt
- Sharing Progress with 3rd Parties
- Self service tools and infrastructure
- Managing work cadence and deliverables
- Distributed teams
- Books
- Blogs
A leader in an organization should seek to emulate the following:
- Seeks to serve those who they manage and place the needs of the team before their own.
- Practices 'Extreme Ownership'. A leader is always responsible and must own any mistake or shortcoming of the team. i.e. "There are no bad teams; just bad leaders"
- Creates a 'Psychologically safe' environment for the team members to succeed. Safe environments are the basis high performance per Google's ReWork study, Maslow's heirarchy of needs, and other similar studies.
- Work to create an atmosphere of humility, respect and trust.
- Scale themselves and setup a culture of autonomy
A leader should encourage the following:
- Growth mindset Challenge themselves and others with assignments that stretch their capabilities
- Mastery Aim for a cadence of academic output with the goal of producing small tangible items of value over time. For example, one of more blog posts, open source & conference contributions, Stack Overflow questions, and other such efforts with a cadence, such as quarterly.
A key to being able to grow is being able to focus. Studies and some personal experience suggests that engineers need at least 4 hour blocks of uninterrupted time to achieve flow. Other literature, such as Deep Work highlight the benefits that many successful knowledge workers have achieved though utilizing periods of intense focus. This is also famously described albiet from a different perspective by the blog post 'Maker's schedule, Manager's schedule' which highlights that a single meeting can ruin the afternoon or morning on a maker, whereas filling another slot costs a manager little.
- Protect engineers time, seeking to give as long uninterrupted stretches possible
- Perform daily standups over slack/chat when possible
- Timebox meetings
- Batch meetings so that they can occur together
- Allow engineers at least 24 hours to respond to any email or instant message
- Encourage those nearby to take calls in office rooms rather than the desk in the open office space
I have two kinds of problems, the urgent and the important. The urgent are not impor‐
tant, and the important are never urgent.
- US President Dwight D. Eisenhower. 1954
The job of a leader is to force yourself (and lead the team) to work on important things, rather than urgent things.
- Delegate - many urgent things can be delgated back to other leaders in the organization
- Schedule Dedicated things - Block out time to work on important tasks such as strategy, career paths, and higher level collaboration
- Find tracking system that works - Method of tracking goals including low level pomodoro type focusing to high level goals such as OKRs or GSM
Determining what is important is a function of gathering inputs from your key stakeholders (typically team, management and customers) and then laying the information out in a systematic manner such as an excel spreadsheet by effort and impact.
- Multiple viewpoints, such as team brainstorms, are advantageous to determining goals
- Excercises to tease out priority such as 'what happens if we don't do xyz'
- Laddering up to company initiatives or OKRs is advantageous as well
- Impact is not effort and the world is not fair, make sure you can deliver and measure impact whenever possible and be diligent to stay on course and review goals
- The goal of the 1on1 is the create a great company.
- 1on1s are primarily for the employee, and secondarily for the manager.
- It is a free form meeting for any issues, concerns, ideas and frustrations not easily expressed in more formal channels.
Example 1on1 questions
- How would you improve the org?
- Whats not fun about working at the comp?
- Who is doing really great? Why?
- What is the biggst opportunity that the comp is missing out on?
Frequency
- Starting weekly or biweekly is common with the caveat of adjusting as needed
Agendas
- Agendas help keep the meeting on track and set expectations
importance of one-on-ones - Good overview on 1on1's HelpScout blog on 1on1s - Another excellent perspective on the subject Lighthouse Doc Template - Very comprehensive tool for 1on1s
Feedback is a gift. It is difficult to give, especially with quality. It is an 'unatural act' Feedback should be received and given:
- With humility, we all have different perspectives and will often give different feedback in response to the same situation.
- With positive intent on someone's behalf
- With understanding, as even the best of us can have difficulty not taking occasional feedback personally
Sometimes ask the person if they wish to have the feedback to illustrate its for their growth.
Andy Grove High Output Management states that there are only two ways for a manager to improve the output of an employee, motivation and training. Training is a high return activity that should be led by the managers of an organization. a16z shares how this is a great method to set expectations within an organization.
Developers are primarily motivated with intrinsic motivation, feeling ownership with their products and working to make them succeed. Intrinsic motivation is driven by
- Autonomy - Allowing developers to drive their work and influence the direction and goals of the product
- Mastery - This is huge, opportunity to improve existing skills and learn new ones. The video game industry is driven in no small part by our inborn desire for progression and mastery. Experts can do amazing things if allowed to grow their knowledge one small victory at a time.
- Purpose - Working on products that have significance and seeing the effects of your work. In a dramatic example, Elon Musk companies have recently been able to harness this with thier employees with the incredible amount of work that went into the Tesla vehicles (Move from Fossil Fuels) vs the much larger traditional auto makers and the progress of SpaceX (Occupy Mars) compared to the traditional space industry.
Performance is attributed to a mix of individual and situation. As a hobby farmer, I enjoy using a plants analogy. Different plants need different circumstances, some do better with lots of water while others need to dry out for example. Some plants such as pine trees are highly resilent and can grow in any condition, including the middle of a swamp for example, but don't product much in the way of fruit. Peach trees however, require a very specific mix of light, soil, water and protection from pests but can produce a bumper harvest of fruit in the right conditions.
Performance is often a tradeoff. An engineer may be 10x more productive than their peers as long as they work in a specific codebase and language but may be very inflexible with tasks that fall outside their preference. Other engineers are good at learning any new language and decifering ambiguity, but don't excel in any one area. Even in nature, specialists thrive when the environment is stable and generalists when there is more upheaval.
Thought should be given to if the circumstance fits the individual and if they are setup for success. Sometimes people are in a good situation but still miss expectations. Sometimes its because they are not working hard enough and unfortunely other times performance will not improve despite hard work as they are a poor fit for the role. In this case keeping the low performer on the team doesn't do them any favors and likely keeps them from a better fit elsewhere.
- Ideally, as must trust as possible is built before poor performance is given (or received :)
- Feedback should be given regularly and often, for example, do not let an underperformer do not allow someone to find out they have been missing expectations for the past six months when you could have shared it after one week
- Feedback should be direct, don't beat around the bush
- Any communication should be made sure it has been heard
Coaching low performers often requires temporary micromanagement, and a lot of trust and respect from both sides.
- Set specific time frame with specific measurable goals so that there is opportunity for small success
- Meet weekly and setup explicit expectations around each milestone
- Low performer will likely improve or find another opportunity
Like the peach tree or the race car, top performers typically need extra care to sustain themselves. Sometimes it involves understanding the tradeoffs or making sure they are consistently challenged with incremental assignments.
When hiring, aim to be transparent and hiring with offers in line with data. Expect a funnel with a percentage of closing rates that resembles a typical sales funnel.
It is important to be humble in regards to interviewing. Good candidates may fail the interview and bad candidates may game your system. Nevertheless, it is important to be disciplined and always improving the interview practice based on feedback. Inspiring by Steve Yegge's post, I have found asking questions that cover multiple areas of computer science allows stronger computer science candidates to succeed and these candidates are generally more flexible and capable than asking trivia about a single technology. I try to cover:
-
Whiteboard coding questions in the style of Leet code
-
General computer science O(n) of algorithms and data structures
-
Linux questions (ls, grep, etc). Linux and bash are important and are growing faster than ever based on the python developer report and other sources.
-
System design question (How would you design an online web app that does x..). These questions tend to favor those with practical experience or a coding hobby.
-
Behavioral General ability to communicate and bucket for soft skills
-
Misc. Sometimes candidates have skills and capabilities in my experience that generally but not always correlate with high performance.
- Open source contributions
- Interesting side projects
- University or prior work pedigree
- Math minors or other scientific degrees
- Interest in multiple programming languages/tools
References
Bias is real and a team is well served by actively combating it, otherwise you will likely end up with a team that lacks diversity and thus be lower performing. Some steps that help reduce bias:
- Use automation whenever possible. For example, using a tech screen for new candidates vs a resume screen is a good way to get a broader qualified applicate pool and helps those who have the skills but not the pedigree.
- Have a diverse hiring committee contributes to hiring diverse candidates
- Use the same diverse committee to review job posts as well to make them neutral
- Have hiring managers blind to each others feedback until everyone has contributed their own feedback
- Have a standard scoring system based on weighted evaluations, for example the five categories explained in this post, and ask interviewers to explain their scores for candidates.
- When possible, share the resume without identifying information
- Sourcing candidates from Universities and recruiting are higher yield strategies for getting a diverse candidate pool
- One strategy I found interesting to the email the candidates the category and information on the questions I would ask. Even doing so, I found 1/5 candidates noticably used the information to prepare for the phone screen (across roughly 100 candidates for one job posting at one company). I hoped that this allowed candidates who wanted the job more the option to prepare as the majority of candidates seemed to ignore the information.
I have found that one should aim to hire for strengths rather than lack of weakness. Nearly all High performing individuals tend to have severe weaknesses in my experience and yet they can do remarkable things, such as actually be a '10x engineer'. However, this is harder than it seems as committees tend to hire for lack of weakness as its a more conservative strategy. This is a way to get good but rarely great candidates as nearly all the competitors will be hiring for lack of weakness as well.
Communication can be a big challenge as a team grows in size as the cost of communication grows quadratically (Brooks Law). The links between team members can be modeled as (n*(n-1))/2 or roughly n^2. A team of 5 people has 10 links between the members whereas a team of 14 has 91. Experiments have shown larger teams can be less efficient. Finally, the US Military has conducted extensive research on the topic and has found the ideal team size to be five members. Two recommendations can be drawn from this:
- Amazon, Spotify and others have demonstrated the power of small teams (<10 people) connected by api's.
- Meetings should be as small as possible and invite those deemed absolutely necessary. Jeff Bezos (allegedly) will not even show up if the meeting has more than 10 people. For the rest of us, an 8-18-1800 rule can be a good guideline.
Even the sharpest knife can get dull over time.
-
Read the top engineering blogs such as Facebook engineering, Stripe, Twitter and others
-
Stay up to date with data driven research on industry trends. Resources such as the annual state of devops report, python developer report, and stack overflow trends and finally kp internet trends report are just a few of the excellent resources available.
-
Contribute to open source, which keeps one sharp as open source contributors will often hold you work to a high standard which isn't always the case within an organization for various reasons.
-
Push yourself to achieve externally validated industry benkmarks. For example, the deployment frequency and service uptime benchmarked in the annual devops report is a good target that is generally well accepted.
Software Engineer performance is difficult to measure objectively.
- The number one evaluation is hitting goals, which for software engineering usually includes shipping on time. Beyond this as with most industries one can use a combination of quality and throughput.
- Performance and impact is weighed against the individuals level, experience, situation and capabilities
Data from the annual devops report which has surveyed over 30,000 developers over the past five years has found the best measures a combination of productivity and quality.
Productivity/Throughput
- Deployment frequency
- Lead time for changes
Quality
- Time to restore service
- Change failure rate - The number of deployments in which something goes wrong
They have found these measures sufficient to segment performance between Elite, High, Medium and Low performers. This can be a useful data point when measuring team performance. Some teams may be really productive but have poor quality. Other teams may excell at both and others still may do poorly at both.
Five key dynamics that set successful teams apart from other teams at Google
- Psychological safety: Can we take risks on this team without feeling insecure or embarrassed?
- Dependability: Can we count on each other to do high quality work on time?
- Structure & clarity: Are goals, roles, and execution plans on our team clear?
- Meaning of work: Are we working on something that is personally important for each of us?
- Impact of work: Do we fundamentally believe that the work we’re doing matters?
Individual engineers can be evaluated on both productivity and quality as well.
-
Productivity is easier to measure than quality and I would argue less important. An agile course or a scrum board can help you estimate tasks and track their execution over time thus reasonably measuring productivity. Non technical managers can do a good job in this area with agile experience.
-
Quality is more difficult, there are some things to look for as data points to contribute to a weighted sense.
-
Achieving agreed up goals is a useful signal and probably the closest proxy to passion and productivity
-
360 feedback aka asking the peers about an individual in their respective 1:1's. Teams who work with someone on a daily basis are difficult to fool and will have an excellent perspective in which to evaluate performance.
-
Unit tests - Automated testing is a key indicator of quality. If an engineer fixes a bug and doesn't also create a test to ensure the bug stays fixed, then how does one know the engineer didn't cause two other bugs when they fixed the first bug? Without tests, the engineer could theorectically work forever causing bugs every time they fix a bug and then refixing the same bugs. Tests would let the engineer know when previously working code has been broken due to new changes.
-
Comments - Documentation that shares how a code works
-
Elegance/brevity - This is more subjective, but typically less code to accomplish a task is better than more. Simpler is better than complex. If a solution is simple and short it is typically better than a solution that is big and complex. Complexity of course can not be avoided but it should be fought for every inch of ground given up. Instead of counting lines of code, perhaps thinking of a solution as to how many lines it cost?
There is a common trap for a non-technical manager is to overly reward engineers who are productive at the expense of quality. This results in some common anti-patterns such as 'rockstar' engineers who create untested, difficult to maintain spaghetti code but are increasingly rewarded by an organization. Other 'average' engineers are often tasked with maintaining the service afterwards and can be blammed for delays in this code as they finish implementing the shortcuts of the 'rockstar'.
Technical debt is a killer of productivity and often difficult to describe to key decision makers. It eludes simple bar charts and other measurements but clearly impacts productivity. Is can be described many ways, including a measurement of Complexity or simply lacking tests or some other decided upon minimum standard.
Technical investment work can be framed as improvements, fixes and Upgrade/Updates to Security, Performance, Throughput and Stability. When possible, develop a strategy for iterating and measuring your way to improvement.
There are two main sources of complexity
- Dependencies
- Obscurity
Three symptoms of complexity
- Change Amplification - Simple changes requiring update in many places
- Cognitive load - How much a developer needs to know to complete a task
- Unknown unknowns - It's not obvious which code must be modified to complete a task
-
At Google there was a joke that there was always two systems in regards to infrastructure. One that is depreciated and another that doesn't work.
-
Additionally, at Google they encourage occaisonal rewrites. This is a bold stance that flies in the face of conventional wisdom of never rewriting software. As a single anecdote, consider Outlook vs Gmail. I have had way more experiences with bugs with one of those than the other.
Legacy code is simply code without tests. ― Michael C. Feathers, Working Effectively with Legacy Code
Few topics are as divisive and controversal among engineers as that of rewriting services. This is seen as a sort of engineer obsession and that engineers will always opt to write something new as opposed to work on a legacy codebase.
Service rewrites are possible and can return value, but are the exception and not the norm. It requires trememdous buy in from stakeholders over a long period of time, which makes it incredibly risky. Furthermore, if the team maintaining the legacy code is still maintaining and adding features then an additional problem of overtaking the legacy team's velocity comes into play as well. Finally, entrenced interests in the status quo will also oppose any rewrite attempt and can point at every delay and validation to their stance. Improvements must usually come in tiny pieces or not at all. Even harnessing a crises may not help in the long run, as the crises will pass and the rewrite will be questioned.
Arguments for the rewrite are similar to those of any proposed disruptive innovation. Since organizations are notoriously bad at disrupting themselves, such innovations must usually arrive from outside the organization.
Ultimately, technical debt should be viewed through the lense of how permanent the code is. Upgrades to code that lives indefinetly is a different matter than a prototype that will be thrown away next week.
The foundation of continous deployment and integration is minimizing the size of changes and therefore surprises. If upgrading is painful, it should be done more often to reduce the amount of change at any given time. This is of course assuming the benefit tradeoff is positive (i.e. long living software with an positive expected value of upgrading)
Kanban Trello boards are an excellent pattern to share engineering progress with third parties. These third parties can include actual clients who want to know status of features or other members within the organization such as executives.
Notable examples:
Documentation in wiki's tends to become old and out of date quickly. The best documentation lives in github as markdown next to the source code. This documentation already has an excellent CMS (git) to manage it as well as encourages updates due to its always being present. Documentation and training are an incredible enabler of high performance teams and getting a culture to change to one that continually updates documentation (and automation) is one of those factors that separate great development teams from merely good development teams.
Enabling engineers with self service tools and infrastructure (cloud) is an enormous benefit. Non developers often don't think through the waste that can be triggered by making engineers submit a form and get a resource async vs an instant self service. It is quite literally days and weeks in the former case to seconds in the later. Compound this with human nature in that the engineer is not going to drop everything once the resource is granted days later and utilize it. Once a few of these form submitting processes add up (after all, no single raindrop feels responsible for the flood) and its quite understandable to see how unproductive large enterprise engineering teams can become compared to similar teams in a startup and developing solely in cloud.
Schrodinger Backups: The condition of any backup is unknown until a restore is attempted. So resture from backups regularly! There is an embarrising amount of bounty money paid to hackers because the database was encrypted by an attack and the backups do not work or were obsolete.
If you don’t know where you’re going, any road will get you there. – Lewis Carroll
- The default state of a team is for every team member to go in a different direction, like a raft going down a raging river with each paddler ignoring their neighbors. If may be even worse than this, the members may be optimizing for speed of individual paddling or having the perfect stroke while the boat itself is heading towards a waterfall.
- Aligning each member in the same direction is very hard work and completely your responsibility
- The first place to start to determine if the business goals being reached by the team. If you don't have goals, thats a great place to start
- Three month or six month goals are a good default if you enter a situation with no explicit goals
- A goal is a desired result at a high level
- A signal is how you know you have achieved the result, ideally measureable but typically not
- Metric is a proxy for a singal, something that can be directly measured
This framework helps you be more objective and prevent metrics creep by being explicit about goals. A good metric is a reasonable proxy to signal and is traceable back to the original goals.
Regarding processes, it is better to start lighter than heavier. You can always add processes as needed but processes are hard to kill once in place. Every team and every org is different, here are some points to consider.
- Highly motivated and directed engineers can self organize and achieve goals, if possible give them an opportunity to do so. Successful FAANG developer teams and startups are often organized this way.
- For everyone else, agile can be useful if it is well understand and adapted to a specific team. Often a heavy, waterfall-like version is imposed on teams at medium and large enterprises as a result of attempts to measure team productivity and thus ROI. Given the freedom and a small team setting, Agile light or similar resource is a good place to start in this regard.
- Some people do not work well remote, and thats ok
- Don't expect the same level of bonding with remote teams vs local ones. However, activites such as 'Fun Hat Friday' do help in this regard.
- Turn on cameras during meetings, this helps with bonding and communicating social cues.
- Use emoji, its helps convey emotion otherwise lost in text.
- Over-communicate, particularly for managers.
- Have a great workspace that you enjoy and invest in it.
- Periodically share articles or media with people on topics discussed in the past, take notes of interests of team members if needed to make this happen
- 70% of communications are non-verbal and text can be a poor medium, be patient and reflect on miscommunications as an opportunity to level up.
- Get comfortable with quick 10 min zoom/phone calls
- Over clarify expectionas and how people feel about a topic
- Show gratitude to your team!
- Distrubuted tends to benefit establish engineers with increased focus, however it makes it much harder for new engineers to gain context
- Assign onboarding buddy
- Foster an environment for asking questions
- Make investments in onboarding labs and other interactive documentation
- How much do you know about their family? (Spouse, kids, parents, siblings?)
- How much do you know about hobbies, goals or interests?
- How are they feeling overall?
- 15 minute watercoolers to promote discussion on any topic
- 1/week meal meeting to encourage bonding and longer form discussion
- Virtual Holiday parties
- Celebrate occasions such as baby showers
References:
- Deep Work Great info about productivity and managing distractions
- Culture Code Useful for understanding how groups work and how to create positive, better work environments
- Software Engineering at Google Whitepaper Bold engineering principles at Google
- Software Engineering at Google Book Engineering patterns and antipatterns
- Designing Data-Intensive Applications Excellent reference on data structures and scaling
- Building Secure and Reliable Systems Update on building and maintaining modern systems
- Jason Evanish Book List
- randsinrepose Management blog by engineering leader
- Sarah Drasner