Accelerate

Nicole Forsgren PhD, Jez Humble, Gene Kim

#books #kindle

Highlights from July 26, 2021

  • Their evidence refutes the bimodal IT notion that you have to choose between speed and stability—instead, speed depends on stability, so good IT practices give you both. (Location 147)
  • Common traps were stepped in—like trying a top-down mandate to adopt Agile, thinking it was one size fits all, not focusing on measurement (or the right things to measure), leadership behavior not changing, and treating the transformation like a program instead of creating a learning organization (never done). (Location 169)
  • The key to successful change is measuring and understanding the right things with a focus on capabilities—not on maturity. (Location 384)
    • Tags: #favorite
  • While maturity models are very popular in the industry, we cannot stress enough that maturity models are not the appropriate tool to use or mindset to have. (Location 385)
  • First, maturity models focus on helping an organization “arrive” at a mature state and then declare themselves done with their journey, whereas technology transformations should follow a continuous improvement paradigm. Alternatively, capability models focus on helping an organization continually improve and progress, realizing that the technological and business landscape is ever-changing. The most innovative companies and highest-performing organizations are always striving to be better and never consider themselves “mature” or “done” with their improvement or transformation journey—and we see this in our research. (Location 387)
  • Second, maturity models are quite often a “lock-step” or linear formula, prescribing a similar set of technologies, tooling, or capabilities for every set of teams and organizations to progress through. Maturity models assume that “Level 1” and “Level 2” look the same across all teams and organizations, but those of us who work in technology know this is not the case. In contrast, capability models are multidimensional and dynamic, allowing different parts of the organization to take a customized approach to improvement, and focus on capabilities that will give them the most benefit based on their current context and their short and long-term goals. (Location 392)
  • Third, capability models focus on key outcomes and how the capabilities, or levers, drive improvement in those outcomes—that is, they are outcome based. This provides technical leadership with clear direction and strategy on high-level goals (with a focus on capabilities to improve key outcomes). It also enables team leaders and individual contributors to set improvement goals related to the capabilities their team is focusing on for the current time period. Most maturity models simply measure the technical proficiency or tooling install base in an organization without tying it to outcomes. (Location 398)
  • Fourth, maturity models define a static level of technological, process, and organizational abilities to achieve. They do not take into account the ever-changing nature of the technology and business landscape. (Location 404)
  • Our research has identified 24 key capabilities that drive improvement in software delivery performance and, in turn, organizational performance. These capabilities are easy to define, measure, and improve. (Location 424)
  • Using velocity as a productivity metric has several flaws. (Location 474)
  • Queue theory in math tells us that as utilization approaches 100%, lead times approach infinity—in other words, once you get to very high levels of utilization, it takes teams exponentially longer to get anything done. (Location 482)
  • Since lead time—a measure of how fast work can be completed—is a productivity metric that doesn’t suffer from the drawbacks of the other metrics we’ve seen, it’s essential that we manage utilization to balance it against lead time in an economically optimal way. (Location 483)
  • A successful measure of performance should have two key characteristics. First, it should focus on a global outcome to ensure teams aren’t pitted against each other. (Location 487)
  • Second, our measure should focus on outcomes not output: it shouldn’t reward people for putting in large amounts of busywork that doesn’t actually help achieve organizational goals. (Location 490)
  • In our search for measures of delivery performance that meet these criteria, we settled on four: delivery lead time, deployment frequency, time to restore service, and change fail rate. (Location 492)
  • Lead time is the time it takes to go from a customer making a request to the request being satisfied. (Location 494)
  • However, in the context of product development, where we aim to satisfy multiple customers in ways they may not anticipate, there are two parts to lead time: the time it takes to design and validate a product or feature, and the time to deliver the feature to customers. (Location 495)
  • We measured product delivery lead time as the time it takes to go from code committed to code successfully running in production, (Location 511)
  • The second metric to consider is batch size. Reducing batch size is another central element of the Lean paradigm—indeed, it was one of the keys to the success of the Toyota production system. (Location 514)
  • we settled on deployment frequency as a proxy for batch size since it is easy to measure and typically has low variability. (Location 518)
  • How quickly can service be restored? We asked respondents how long it generally takes to restore service for the primary application or service they work on when a service incident (e.g., unplanned outage, service impairment) occurs, offering the same options as for lead time (above). (Location 529)
  • Finally, a key metric when making changes to systems is what percentage of changes to production (including, for example, software releases and infrastructure configuration changes) fail. (Location 532)
  • The fact that software delivery performance matters provides a strong argument against outsourcing the development of software that is strategic to your business, and instead bringing this capability into the core of your organization. (Location 626)
  • in DevOps circles that culture is of huge importance. (Location 663)
  • it is possible to influence and improve culture by implementing DevOps practices. (Location 666)
  • At the first level, basic assumptions are formed over time as members of a group or organization make sense of relationships, events, and activities. These interpretations are the least “visible” of the levels—and are the things that we just “know,” and may find difficult to articulate, after we have been long enough in a team. (Location 671)
  • The second level of organizational culture are values, which are more “visible” to group members as these collective values and norms can be discussed and even debated by those who are aware of them. Values provide a lens through which group members view and interpret the relationships, events, and activities around them. Values also influence group interactions and activities by establishing social norms, which shape the actions of group members and provide contextual rules (Bansal 2003). These are quite often the “culture” we think of when we talk about the culture of a team and an organization. (Location 674)
  • The third level of organizational culture is the most visible and can be observed in artifacts. These artifacts can include written mission statements or creeds, technology, formal procedures, or even heroes and rituals (Pettigrew 1979). (Location 678)
  • Westrum’s further insight was that the organizational culture predicts the way information flows through an organization. Westrum provides three characteristics of good information: It provides answers to the questions that the receiver needs answered. It is timely. It is presented in such a way that it can be effectively used by the receiver. (Location 689)
  • An additional insight from Westrum was that this definition of organizational culture predicts performance outcomes. We keyed in on this in particular, because we hear so often that culture is important in DevOps, and we were interested in understanding if culture could predict software delivery performance. (Location 695)
  • Westrum’s theory posits that organizations with better information flow function more effectively. (Location 752)
  • First, a good culture requires trust and cooperation between people across the organization, so it reflects the level of collaboration and trust inside the organization. (Location 754)
  • Second, better organizational culture can indicate higher quality decision-making. (Location 756)
  • Finally, teams with these cultural norms are likely to do a better job with their people, since problems are more rapidly discovered and addressed. (Location 758)
  • For modern organizations that hope to thrive in the face of increasingly rapid technological and economic change, both resilience and the ability to innovate through responding to this change are essential. Our research into the application of Westrum’s theory to technology shows that these two characteristics are connected. (Location 765)
  • Many Agile adoptions have treated technical practices as secondary compared to the management and team practices that some Agile frameworks emphasize. Our research shows that technical practices play a vital role in achieving these outcomes. (Location 820)
  • In continuous delivery, we invest in building a culture supported by tools and people where we can detect any issues quickly, so that they can be fixed straight away when they are cheap to detect and resolve. (Location 830)
  • A key goal of continuous delivery is changing the economics of the software delivery process so the cost of pushing out individual changes is very low. (Location 835)
  • One important strategy to reduce the cost of pushing out changes is to take repetitive work that takes a long time, such as regression testing and software deployments, and invest in simplifying and automating this work. (Location 837)
  • The most important characteristic of high-performing teams is that they are never satisfied: they always strive to get better. High performers make improvement part of everybody’s daily work. (Location 840)
  • in reality these are all system-level outcomes, and they can only be achieved by close collaboration between everyone involved in the software delivery process. (Location 843)
  • It should be possible to provision our environments and build, test, and deploy our software in a fully automated fashion purely from information stored in version control. Any change to environments or the software that runs on them should be applied using an automated process from version control. (Location 847)
  • Following our principle of working in small batches and building quality in, high- performing teams keep branches short-lived (less than one day’s work) and integrate them into trunk/master frequently. Each change triggers a build process that includes running unit tests. If any part of this process fails, developers fix it immediately. (Location 852)
  • Automated unit and acceptance tests should be run against every commit to version control to give developers fast feedback on their changes. Developers should be able to run all automated tests on their workstations in order to triage and fix defects. Testers should be performing exploratory testing continuously against the latest builds to come out of CI. No one should be saying they are “done” with any work until all relevant automated tests have been written and are passing. (Location 856)
  • Even better, our research found that improvements in CD brought payoffs in the way that work felt. This means that investments in technology are also investments in people, and these investments will make our technology process more sustainable (Figure 4.3). Thus, CD helps us achieve one of the twelve principles of the Agile Manifesto: “Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely” (Beck et al. 2001). (Location 908)
  • Vanguard Method, emphasizes the importance of reducing what he calls failure demand— demand for work caused by the failure to do the right thing the first time by improving the quality of service we provide. This is one of the key goals of continuous delivery, with its focus on working in small batches with continuous in-process testing. (Location 936)
    • Note: The Vanguard method uses the iterative Check-Plan-Do -model to improve organizational capability. Change is initiated by observing the present organization from systems thinking perspective (“Check”): Define the purpose of the system from customer perspective.
  • What was most interesting was that keeping system and application configuration in version control was more highly correlated with software delivery performance than keeping application code in version control. (Location 946)
    • Note: Is this an opportunity for languages such as dhall. Is it better to use this in Version control versus something like a centralized Configurator server? It would definitely be easy to store java .properties files in this way
  • it’s worth investing ongoing effort into a suite that is reliable. One way to achieve this is to put automated tests that are not reliable in a separate quarantine suite that is run independently. (Location 953)
    • Note: Test reliability is paramount Flaky tests should be removed or moved to a ts that is not run in ci
  • Developers primarily create and maintain acceptance tests, and they can easily reproduce and fix them on their development workstations. It’s interesting to note that having automated tests primarily created and maintained either by QA or an outsourced party is not correlated with IT performance. (Location 956)
  • First, the code becomes more testable when developers write tests. (Location 959)
  • Second, when developers are responsible for the automated tests, they care more about them and will invest more effort into maintaining and fixing them. (Location 960)
  • Every commit should trigger a build of the software and running a set of fast, automated tests. Developers should get feedback from a more comprehensive suite of acceptance and performance tests every day. Furthermore, current builds should be available to testers for exploratory testing. (Location 965)
  • successful teams had adequate test data to run their fully automated test suites and could acquire test data for running automated tests on demand. In addition, test data was not a limit on the automated tests they could run. (Location 968)
  • TRUNK-BASED DEVELOPMENT (Location 970)
    • Tags: #blue
  • Our research also found that developing off trunk/master rather than on long-lived feature branches was correlated with higher delivery performance. Teams that did well had fewer than three active branches at any time, their branches had very short lifetimes (less than a day) before being merged into trunk and never had “code freeze” or stabilization periods. It’s worth re-emphasizing that these results are independent of team size, organization size, or industry. (Location 971)
  • We have heard, for example, that branching strategies are effective if development teams don’t maintain branches for too long—and we agree that working on short-lived branches that are merged into trunk at least daily is consistent with commonly accepted continuous integration practices. (Location 977)
  • Anecdotally, and based on our own experience, we hypothesize that this is because having multiple long-lived branches discourages both refactoring and intrateam communication. (Location 982)
  • High-performing teams were more likely to incorporate information security into the delivery process. Their infosec personnel provided feedback at every step of the software delivery lifecycle, from design through demos to helping with test automation. However, they did so in a way that did not slow down the development process, integrating security concerns into the daily work of teams. In fact, integrating these security practices contributed to software delivery performance. (Location 986)
  • Continuous delivery improves both delivery performance and quality, and also helps improve culture and reduce burnout and deployment pain. (Location 991)
  • Thus, a critical obstacle to implementing continuous delivery is enterprise and application architecture. (Location 995)
  • We found that high performance is possible with all kinds of systems, provided that systems—and the teams that build and maintain them—are loosely coupled. (Location 1017)
  • It’s possible to achieve these characteristics even with packaged software and “legacy” mainframe systems—and, conversely, employing the latest whizzy microservices architecture deployed on containers is no guarantee of higher performance if you ignore these characteristics. (Location 1036)
  • We can do most of our testing without requiring an integrated environment. (Location 1046)
  • We can and do deploy or release our application independently of other applications/services it depends on. (Location 1047)
  • In other words, architecture and teams are loosely coupled. To enable this, we must also ensure delivery teams are cross-functional, with all the skills necessary to design, develop, test, deploy, and operate the system on the same team. (Location 1060)
  • Our research lends support to what is sometimes called the “inverse Conway Maneuver,”2 which states that organizations should evolve their team and organizational structure to achieve the desired architecture. The goal is for your architecture to support the ability of teams to get their work done—from design through to deployment—without requiring high-bandwidth communication between teams. (Location 1064)
  • Architectural approaches that enable this strategy include the use of bounded contexts and APIs as a way to decouple large domains into smaller, more loosely coupled units, and the use of test doubles and virtualization as a way to test services or components in isolation. (Location 1067)
  • To measure productivity, we calculated the following metric from our data: number of deploys per day per developer. (Location 1080)
  • By focusing on the factors that predict high delivery performance—a goal-oriented generative culture, a modular architecture, engineering practices that enable continuous delivery, and effective leadership—we can scale deployments per developer per day linearly or better with the number of developers. This allows our business to move faster as we add more people, not slow down, as is more typically the case. (Location 1088)
  • ALLOW TEAMS TO CHOOSE THEIR OWN TOOLS (Location 1092)
    • Tags: #blue
  • There is no contradiction here. When the tools provided actually make life easier for the engineers who use them, they will adopt them of their own free will. This is a much better approach than forcing them to use tools that have been chosen for the convenience of other stakeholders. A focus on usability and customer satisfaction is as important when choosing or building tools for internal customers as it is when building products for external customers, and allowing your engineers to choose whether or not to use them ensures that we keep ourselves honest in this respect. (Location 1109)
  • Discussions around architecture often focus on tools and technologies. Should the organization adopt microservices or serverless architectures? Should they use Kubernetes or Mesos? Which CI server, language, or framework should they standardize on? Our research shows that these are wrong questions to focus on. (Location 1115)
  • Our research shows that building security into software development not only improves delivery performance but also improves security quality. Organizations with high delivery performance spend significantly less time remediating security issues. (Location 1140)
  • Limiting work in progress (WIP), and using these limits to drive process improvement and increase throughput (Location 1205)
    • Note: We me always strive to limit WIP Who Is thee WIP gate keeper?
  • Creating and maintaining visual displays showing key quality and productivity metrics and the current status of work (including defects), making these visual displays available to both engineers and leaders, and aligning these metrics with operational goals (Location 1205)
    • Note: Physical kanban boards for the win!
  • Using data from application performance and infrastructure monitoring tools to make business decisions on a daily basis (Location 1207)
  • What is most interesting is that WIP limits on their own do not strongly predict delivery performance. It’s only when they’re combined with the use of visual displays and have a feedback loop from production monitoring tools back to delivery teams or the business that we see a strong effect. When teams use these tools together, we see a much stronger positive effect on software delivery performance. (Location 1211)
    • Note: Interesting indeed This of course makes perfect sense when I look back on the the years of Jira and other such softwares that hide this information. It’s intriguing that this information also needs to be coupled, or rather throupled, with the production metrics to be most effective at increasing productivity
  • if their WIP limits make obstacles to higher flow visible, and if teams remove these obstacles through process improvement, leading to improved throughput. WIP limits are no good if they don’t lead to improvements that increase flow. (Location 1216)
    • Note: How can a WIP limit lead to. Process improvement?
  • The central concepts here are the types of information being displayed, how broadly it is being shared, and how easy it is to access. Visibility, and the high-quality communication it enables, are key. (Location 1221)
  • We found that approval only for high-risk changes was not correlated with software delivery performance. Teams that reported no approval process or used peer review achieved higher software delivery performance. (Location 1236)
    • Note: This feels obvious to me
  • We found that external approvals were negatively correlated with lead time, deployment frequency, and restore time, and had no correlation with change fail rate. In short, approval by an external body (such as a manager or CAB) simply doesn’t work to increase the stability of production systems, measured by the time to restore service and change fail rate. However, it certainly slows things down. It is, in fact, worse than having no change approval process at all. (Location 1239)
    • Note: Yes’ yes yes! So much yes lengthy change managerial processes suck ass!
  • Our recommendation based on these results is to use a lightweight change approval process based on peer review, such as pair programming or intrateam code review, combined with a deployment pipeline to detect and reject bad changes. This process can be used for all kinds of changes, including code, infrastructure, and database changes. (Location 1243)
  • The fear and anxiety that engineers and technical staff feel when they push code into production can tell us a lot about a team’s software delivery performance. We call this deployment pain, and it is important to measure because it highlights the friction and disconnect that exist between the activities used to develop and test software and the work done to maintain and keep software operational. (Location 1339)
  • We found that where code deployments are most painful, you’ll find the poorest software delivery performance, organizational performance, and culture. (Location 1346)
  • Our research shows that improving key technical capabilities reduces deployment pain: teams that implement comprehensive test and deployment automation; use continuous integration, including trunk-based development; shift left on security; effectively manage test data; use loosely coupled architectures; can work independently; and use version control of everything required to reproduce production environments decrease their deployment pain. (Location 1366)
  • Statistical analysis also revealed a high correlation between deployment pain and key outcomes: the more painful code deployments are, the poorer the IT performance, organizational performance, and organizational culture. (Location 1372)
  • In particular, be aware that if deployments have to be performed outside of normal business hours, that’s a sign of architectural problems that should be addressed. It’s entirely possible—given sufficient investment—to build complex, large-scale distributed systems which allow for fully automated deployments with zero downtime. (Location 1376)
  • Second, the probability of a failed deployment rises substantially when manual changes must be made to production environments as part of the deployment process. Manual changes can easily lead to errors caused by typing, copy/paste mistakes, or poor or out-of-date documentation. Furthermore, environments whose configuration is managed manually often deviate substantially from each other (a problem known as “configuration drift”), leading to significant amounts of work at deploy time as operators debug to understand configuration differences, potentially making further manual changes that add to the problem. (Location 1384)
  • Research shows that stressful jobs can be as bad for physical health as secondhand smoke (Location 1403)
  • Burnout can be prevented or reversed, and DevOps can help. Organizations can fix the conditions that lead to burnout by fostering a supportive work environment, by ensuring work is meaningful, and ensuring employees understand how their own work ties to strategic objectives. (Location 1409)
  • Our research found that employees in high-performing organizations were 2.2 times more likely to recommend their organization as a great place to work, and other studies have also shown that this is correlated with better business outcomes (Azzarello et al. 2012). (Location 1500)
  • People are an organization’s greatest asset—yet so often they’re treated like expendable resources. When leaders invest in their people and enable them to do their best work, employees identify more strongly with the organization and are willing to go the extra mile to help it be successful. In return, organizations get higher levels of performance and productivity, which lead to better outcomes for the business. (Location 1534)
  • Once again, this creates a virtuous circle of value creation in the business where investments in technology and process that make the work better for our people are essential for delivering value for our customers and the business. (Location 1557)
  • Our analysis is clear: in today’s fast-moving and competitive world, the best thing you can do for your products, your company, and your people is institute a culture of experimentation and learning, and invest in the technical and management capabilities that enable it. As Chapter 3 shows, a healthy organizational culture contributes to hiring and retention, and the best, most innovative companies are capitalizing on this. (Location 1568)