Analyzing data
I have found that – as an adult – I have increasingly become more and more responsible for myself. However, this responsibility and the person that I am right now have come from a series of events in my past. Mistakes, successes, and everything in between have defined me and my approach to life.
A lot of my approaches to life also happen to be approaches to DevOps – that’s just how that panned out. Through the life (and DevOps) lessons that I have learned, I’ve found two things: you must live for the person you are right now, and that person is defined by your past and yet is not the same person as in the past.
Your workload follows a similar pattern. It is based on your history but cannot completely be considered the same as it was previously. The code has probably changed, the infrastructure is different, and even the personnel that implement it most likely have changed. However, with all that being said, there are still lessons to be learned from the past. Important lessons.
Earlier in this book (Chapter 1, Introducing DevOps Principles, to be exact), I emphasized the need for monitoring and logging, and I said that these were great tools for event handling and maintaining the historical performance of your workload. Now, we will begin exploring how that historical performance can give us insights that we can use to our advantage.
We will look at a couple of analysis techniques: analysis of live data and analysis of historical data. Each presents challenges. Each can act as a template to solve a fairly common DevOps problem.
Analysis of live data
Live or streaming data is data that is constantly being processed by a system at present. It is data that’s being received or returned by a system. A system lives off of the data that it absorbs and generates (input -> process -> output). The system is shaped by this data and sometimes, in the case of critical systems, it needs to be molded by this data. To make tactical decisions based on recent data, collecting live data and immediate insights on that data is necessary.
Most clouds and monitoring systems come with default ways to store and analyze live data. And for the most part, they are quite effective. They can store data and generate insights on that data to a certain extent. However, sometimes, a custom approach is necessary. And this is where Python excels. Not because of speed, but because of convenience and a pre-built library for analysis, Python (even with only its default libraries) can perform data analysis and conversion on practically any kind of data that a system gives out.
So, let’s look at an example where we use Python’s built-in marshal library to decode a byte string:
import marshal”’ function to decode bytes ”’def decode_bytes(data):”’ Load binary data into readable data with marshal ”’result = marshal.loads(data)”’ Return raw data ”’return result
Byte strings are often used in network communication and cryptography (both of which usually involve live data), and converting them into other languages may require adding libraries and possibly creating custom data types. There’s no such need with Python.
But this usually accounts for smaller sizes of data and recent data, sometimes as recent as the last millisecond. For a truly historic analysis, you need millions of rows of data. You also need something that can analyze that data. Here, Python excels even more.