DevOps Pipelines: Are My Stages Robust?

Posted by Darin Pantley on December 16, 2017

What does a robust pipeline stage look like?

A typical pipeline has several stages (e.g. build, deploy). They might be stages that work flawlessly, stages that fail frequently, or a mix of both. Ideally you want stages that are fully automated and easy to maintain. Although each stage does something different, all robust stages will share certain features.

  • Primary functionality
    • Does your stage have a purpose and do it well?
  • Documentation
    • Will future maintainers know what this stage does and why it was designed that way?
    • Are issues resolved faster because your documentation is helpful and reliable?
  • Alerts
    • Who can help diagnose a problem?
    • Who can help fix the problem?
    • Do they prefer to receive emails, instant messages, text messages, phone calls, or sensory cues like flashing lights and audible sirens?
    • Are you recording these events in an error tracking system?
  • Logs
    • Can you tell what your stage is doing as it runs?
    • Can you tell what artifacts were generated or manipulated by the stage?
  • Metrics
    • Which statistics would…
      • help guide future development?
      • help make maintenance easier?
      • reduce the time needed to diagnose problems?
      • be useful for triggering alerts?
      • be interesting to review in a report?
  • Reports
    • Are you making good use of the data you’re recording?
    • Do you have questions about your app that you can answer using an automatically generated report based on data collected in this stage?
    • Examples:
      • Who is contributing to this stage?
      • How much do the bugs caused by this stage cost?
      • How much memory does this stage require?
      • How long does this stage take to complete?
      • How many failures require manual intervention versus automatic resolution?

Analyzing the features within every stage

Adding the above features to each of your pipeline stages is one thing. Doing it well is another. Poorly implemented features will just increase your maintenance overhead and cause more problems than they solve.

For any given pipeline stage feature, you can analyze it to determine whether it’s implemented well. For example, you might take a look at your build stage’s documentation and ask, “Is it fully automated?”. Or you might examine the logs you generate while performing static analysis and ask, “Where are these stored? For how long?”.

Ask yourself the following questions about every feature in all of your stages…

General questions

  • Who maintains this?
    • Is the maintenance load shared among multiple people?
    • What happens if one of the maintainers stops working on the project?
  • What is it? What is its purpose?
  • Does it have a good architecture?
    • How will you quantify its quality over time?
  • How does it handle authentication?
    • Are any secrets stored in the source code?
  • Are there any security concerns?
  • What happens when it inevitably fails?
  • Does it need to scale horizontally and vertically?

Automation & Reproducibility

  • When is it generated?
  • Is it fully automated?
    • You’ll thank yourself later. Automating everything provides dramatic benefits throughout entire organizations.
  • Is it possible to recreate identical copies of old versions?
    • Reproducibility is very important! It reduces the likelihood of external dependencies causing issues for you and your code. It also allows you to recreate artifacts at will.
  • What are the upstream dependencies?
    • You don’t want the libraries that you depend on to change from one build to another from the same source code.
  • What are the downstream dependencies?
    • When you refactor your code and change externally facing APIs, it will impact anything that depends on your app.

Artifacts & Retention Periods

  • Where is the output stored?
  • Why is the output useful?
  • How long does the output persist?
  • Is it possible to obtain old versions of the output?