Tools from Cloud Native Landscape that you want to use in production

Shivanshu Raj Shrivastava
5 min readMar 14, 2022

If you’ve researched cloud native applications and technologies, you’ve probably come across the CNCF cloud native landscape. Unsurprisingly, the sheer scale of it can be overwhelming. So many categories and so many technologies. How do you make sense of it?

As with anything else, if you break it down and analyze it one piece at a time, you’ll find it’s not that complex and makes a lot of sense. In fact, the map is neatly organized by functionality and, once you understand what each category represents, navigating it becomes a lot easier.

In this guide, we’ll break this mammoth landscape down and provide a high-level overview of its layers, columns, and categories.

Provisioning

Provisioning is the first layer in the cloud native landscape. It encompasses tools that are used to create and harden the foundation on which cloud native apps are built. You’ll find tools to automatically configure, create, and manage the infrastructure, as well as for scanning, signing, and storing container images. The layer also extends to security with tools that enable policy setting and enforcement, embedded authentication and authorization, and the handling of secrets distribution. That’s a mouthful, so let’s discuss each category at a time.

Service Mesh

What it is

Service meshes manage traffic (i.e. communication) between services. They enable platform teams to add reliability, observability, and security features uniformly across all services running within a cluster without requiring any code changes.

Along with Kubernetes, service meshes have become some of the most critical infrastructure components of the cloud native stack.

Problem it addresses

In a cloud native world, we are dealing with multiple services all needing to communicate. This means a lot more traffic is going back and forth on an inherently unreliable and often slow network. To address this new set of challenges, engineers must implement additional functionality. Prior to the service mesh, that functionality had to be encoded into every single application. This custom code often became a source of technical debt and provided new avenues for failures or vulnerabilities.

Examples: Istio, Linkerd, Kuma, Meshery

Observability

Observability is a system characteristic describing the degree to which a system can be understood from its external outputs. Measured by CPU time, memory, disk space, latency, errors, etc., computer systems can be more or less observable. Analysis is an activity in which you look at this observable data and make sense of it.

To ensure there is no service disruption, you’ll need to observe and analyze every aspect of your application so every anomaly gets detected and rectified right away. This is what this category is all about. It runs across and observes all layers which is why it’s on the side and not embedded in a specific layer.

Tools in this category are broken down into logging, monitoring, tracing, and chaos engineering. Please note that the category name is somewhat misleading — although chaos engineering is listed here, consider it a reliability tool rather than an observability or analysis tool.

Monitoring

What it is

Monitoring refers to instrumenting an app to collect, aggregate, and analyze logs and metrics to improve our understanding of its behavior. While logs describe specific events, metrics are a measurement of a system at a given point in time — they are two different things but both necessary to get the full picture of your system’s health. Monitoring includes everything from watching disk space, CPU usage, and memory consumption on individual nodes to doing detailed synthetic transactions to see if a system or application is responding correctly and in a timely manner. There are a number of different approaches to monitor systems and applications.

Problem it addresses

When running an application or platform, you want it to accomplish a specific task as designed and ensure it’s only accessed by authorized users. Monitoring allows you to know if it is working correctly, securely, cost effectively, only accessed by authorized users, as well as any other characteristic you may be tracking.

How it helps

Good monitoring allows operators to respond quickly, and even automatically, when an incident arises. It provides insights into the current health of a system and watches for changes. Monitoring tracks everything from application health to user behavior and is an essential part of effectively running applications.

Examples: Prometheus, Cortex, OpenMetrics, Thanos

Logging

What it is

Applications emit a steady stream of log messages describing what they are doing at any given time. These log messages capture various events happening in the system such as failed or successful actions, audit information, or health events. Logging tools collect, store, and analyze these messages to track error reports and related data. Along with metrics and tracing, logging is one of the pillars of observability.

Problem it addresses

Collecting, storing, and analyzing logs is a crucial part of building a modern platform and logging performs one or all of those tasks. Some tools handle every aspect from collection to analysis while others focus on a single task like collection. All logging tools aim at helping organizations gain control over their log messages.

Examples: Fluentd, Loggly

How it helps

When collecting, storing, and analyzing application log messages, you’ll understand what an application was communicating at any given time. But as logs only represent messages that applications or platforms deliberately emit, they don’t necessarily pinpoint the root cause of a given issue. That being said, collecting and retaining log messages over time is an extremely powerful capability and will help teams diagnose issues and meet regulatory and compliance requirements.

Tracing

What it is

In a microservices world, services are constantly communicating with each other over the network. Tracing, a specialized use of logging, allows you to trace the path of a request as it moves through a distributed system.

Problem It addresses

Understanding how a microservice application behaves at any given point in time is an extremely challenging task. While many tools provide deep insights into service behavior, it can be difficult to tie an action of an individual service to the broader understanding of how the entire app behaves.

How it helps

Tracing solves this problem by adding a unique identifier to messages sent by the application. That unique identifier allows you to follow (or trace) individual transactions as they move through your system. You can use this information to see the health of your application as well as debug problematic microservices or activities.

Examples: Jaeger , OpenTelemetry, OpenTracing

See you soon with another blog with similar exciting cloud native technologies and tools to effectively use right tools at the right time in production :)

--

--

Shivanshu Raj Shrivastava

GSoCer | Technical Content Writer | Developer | Open Source | Electronics Enthusiast | Specialization in ML | Freelance Technical Writer