A mass media project with AWS and CodePipeline. A fintech that uses Azure and Jenkins. A retail company with GCP and Circle CI. All of these projects (x 100) happen at Wizeline simultaneously, and with so much variety comes a challenge: How can we ensure DevOps-centered services for all our clients?
We’ve come up with best practices to guide us into guaranteeing a cohesive, quality outcome, regardless of the differences of each project. Here they are, in a super-simplified form: A 9 steps guide on ensuring DevOps practices in a project or product team with minimal overhead.
TL;DR: create a process map, understand your high-level architecture, use a public cloud provider, use containers, create a CI/CD and continuous testing automation, set up good monitoring, document and share the knowledge, make a plan.
1. Create a Value Stream Map
First, make sure to have a current map of all your activities. This is called a Process Map. Then, take it one step further and convert it into a tool taken directly from Lean, a Value Stream Map.
A Value Stream Map is based on one idea: to improve something, it must first be measured.
Therefore, you must understand what the value of your project is. Value is any quality of your work —service or product— the customer is willing to pay for. Value is generated when a step in your process transforms its input, taking it towards what will become the final output of this process.
- Map all the activities, from feature requests to code in production
- Add current times and expected times for each of the activities
- Add who is responsible or the approver for each of the activities
- Classify all activities in either plan, code, build, test, release, deploy, operate, or monitor.
Once you have this, you can eliminate, reduce, or automate the activities that do not contribute to your value.
Example of a simple process map
2. Understand your High-Level Architecture
Create a simple Service Map to understand how your internal services are communicating with other services.
Example of a simple Service Map
Then, create an infrastructure architectural diagram of the whole solution to understand your limits, technologies, and possible bottlenecks.
Infrastructure Architecture Diagram
Pro Tip: The [c4model](https://c4model.com/) is excellent for creating these diagrams. Start simple and then add more detail.
3. Use your Preferred Cloud Provider
To create modern applications, the best approach is to use a Public Cloud. These managed services can escalate and provide a reliable infrastructure that will help you focus on your code without worrying about managing infrastructure at a deep level.
4. Select your Cloud Runtime: Containers
Container technology is a great tool to achieve homogenized, replicable environments in a deployment-convenient and secure way.
- Start by dockerizing your service for your local development environment.
- Create a docker-compose manifest to run all services locally. This has great benefits since you can easily run databases or other services without installing applications in your machine and avoiding conflicting versions. Every application will run isolated in its container.
- Dockerize the test execution and CI environment for additional consistency and reproducibility.
There are other runtimes, but containers are our preferred option.
Pro Tip: Make sure to follow the 12 factors model for your application.
5. Set up the Cloud Environments and Infrastructure as code
- For your infrastructure as code (IaC) scripts, you can either create a repository or a directory within your service repository. We recommend you separate the platform side and the application-specific side. Some or all of your services share the platform code infrastructure. It could be a Kubernetes cluster, a large RDS database, or a centralized configuration service.
- Automate the IaC provisioning. Its security benefit: no credentials are shared or created because it runs on an automated build system. Its productivity benefit: you can trigger the job from a website without setting up a local environment. Some of our favorite tools for these jobs are Atlantis, Terraform Cloud, and Github Actions.
- Create the Terraform module for the environment. As with any code, we want to reduce duplication and increase cohesion and modularization. By creating the environment module, we can easily replicate it for our development, test, and production environments.
- Instantiate each of the required environments. Now we can use the previously created module to create our cloud environments.
6. Create the CI/CD
Continuous Integration and Continuous Delivery/Deployment are at the core of the DevOps practices, connecting many of the software development life cycle activities.
- Create the CI pipeline for each of the services. CI makes sure we don’t introduce bad code or failing test cases to our main branch.
- Create a CD pipeline for each of the services. The main component of a CD pipeline is to automate the application deployment process. But it doesn’t stop here; having proper tests and multiple environments is as important for a healthy CD.
- Add useful and meaningful notifications to the CI/CD.
- Add and extend tests to the CI/CD pipeline to increase the confidence of every change.
7. Implement Continuous Testing
- Dockerize the UI, integration, and sanity tests execution. We get the same benefits as when dockerizing your application; we can run tests in different environments or by automation without worrying about preparing the environment and dependencies.
- Parameterize the test execution. Tests should be treated as an application and follow similar principles. By parameterizing our tests, we can run them against different environments and configurations.
- Add test data and reports to a data store like s3. Using the parameterization in the previous step and the decoupled data, we can use different data sets. Store the reports to generate metrics like coverage, number of tests, and test execution time.
- Create a job to execute the tests on-demand for any environment and integrate it with existing CI/CD.
Monitoring is at the base of the site reliability pyramid of needs. Whenever the application fails or has issues, the development team must debug and troubleshoot themselves. Having proper monitoring enables it.
- Add log aggregation to all the services in the environments.
- Add some other kind of monitoring—tracing, metrics, security scanners, APM—as needed.
9. Share the Knowledge
Having a system that is hidden from everybody is not useful. As important as the tests or the CI/CD, we need to have proper documentation and knowledge-sharing in place.
- Everybody should know how to deploy using the automated setup.
- Everybody should know how to run and debug the CI using the automated setup.
- Everybody should know how to run the integration and UI tests using the automated setup.
- Everybody should know how to access the logs and monitoring systems.
- Document and communicate the value map, architecture, and infrastructure to all project members, along with new decisions.
- Create playbooks on how to troubleshoot production issues and runbooks for common operations tasks.
- Document and share all lessons learned.
- Keep a standard interface whenever possible. For example, having makefiles where all build tasks are called by “make build.”
- Keep your task decoupled from the CI tool. This makes it easier to develop, test, debug, and run locally.
- Use managed services to minimize maintenance costs, especially on projects where a dedicated SRE or operations engineer is not expected.
- Start simple. Run one test case, deploy to one environment, draw some circles and lines in a whiteboard, iterate, and incrementally improve from there.
- It’s okay if the first infrastructure iterations are created manually using the console. This makes exploration faster. We can then code our findings.
- Git is your friend. Version control all your code, infrastructure, configuration, and documentation.
Now that you know this, make an implementation plan for your team. It doesn’t need to be complex or with fancy tools. A simple list in a spreadsheet will do. Identify all tasks missing to cover the steps mentioned in this guide. It’s important to assign efforts and owners to each activity for proper accountability and estimate when everything will be complete.
There is an overwhelming number of DevOps tools and articles out there. Don’t let them distract you; no tool will do these steps for you. Just keep in mind: focus on value and reduce waste. The rest will follow!