Prometheus — Titan in Systems and Service Monitoring

Prometheus — Titan in Systems and Service Monitoring

In today’s IT world, all users would like to ensure that a system or service is highly available, fast and efficient. Systems can crash unexpectedly and users make an unusual claim on their systems or services are slow. There is a growing need to monitor hundreds of services and thousands of processes in a microservices architecture. This growing demand requires the need for effective monitoring of the environment, collecting stats so that we can proactively act or report when needed. Also, the traditional monitoring system couldn’t keep up with the growing demand in a microservices architecture. There is a need for a robust and effective monitoring system.

Prometheus remains a titan in open-source systems monitoring and alerting toolset.

Before we delve into the details on Prometheus let us understand a little bit about the history of Prometheus. Prometheus was built by SoundCloud starting in 2012 as the monitoring tools at that time like StatsD and Graphite was not adequate enough with the growing needs in the company. Prometheus was started as an open-source project in 2012 in GitHub (under Apache 2.0 license) and since then it has steadily grown more mature to solve real-world production-grade monitoring problems. A few well-known users include Docker, DigitalOcean, Amadeus, Boxever, CoreOS, GrafanaLabs, SoundCloud and much more.

Prometheus collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.

Prometheus is designed to be reliable and centralized not dependent on network storage or other remote services. The Prometheus system consists of multiple components, some of which are optional.

  • Prometheus server — Stores metrics data in a time series data format. The data retrieval workers using HTTP pull model to pull metrics from external sources to store in the database. HTTP server provides a simple web interface for query purpose

  • Client libraries — There are several official and unofficial client libraries to add instrumentation to the application code

  • Push gateway — Allow ephemeral and batch jobs to expose their metrics to Prometheus

  • Exporters — Export existing metrics from third-party systems as Prometheus metrics. For Example HAProxy, StatsD, Graphite

  • Alert Managers — Defines alerts sent by the Prometheus server. Receiver integrations include email, Webhooks, Slack, HipChat, Pushover, PagerDuty, or OpsGenie

  • Data visualization and export — Grafana or other API consumers can be used to visualize the collected data.

  • Other tools

Detailed Prometheus Architecture

Brian Brazil’s, Robust Perception, provides support and consulting services around Prometheus and monitoring

You would have noticed a key difference between Prometheus and other monitoring systems like Amazon CloudWatch, New Relic. Prometheus uses a pull mechanism where the clients provide simple HTTP endpoints where the retrieval workers can pull from. For cases where you must push Prometheus offers Pushgateway.

Prometheus is a Cloud Native Computing Foundation graduated project. Most of Prometheus components are written in Go programming language. Some are also written in Java, Python, and Ruby

We will not go through the details of installation and setup instead have a look at some of the best practices when using Prometheus.

Prometheus star history

  • Instrument every component (including libraries). For resources like queues, CPUs, disk use USE (Utilization Saturation and Errors) method and for endpoints use RED (Utilization Saturation Errors) method

  • Metrics and Labels — Use unit suffixes with base units (seconds, bytes, meters — not milliseconds, megabytes, kilometers), accumulating count has total as a suffix. Either the sum() or the avg() overall dimensions of a given metric should be meaningful. See Metric and Label naming for more detailed conventions

  • Label Cardinality — Label values has to be well bounded. Don’t use unbounded label values like public IP addresses, user IDs etc

  • Errors, successes, and totals — When tracking failures and successes track failures and total requests

  • Metric types — Prometheus provides 4types of metrics, counter, gauge, histogram, and summary. Do not use a counter to expose a value that can decrease. Use gauge instead.

  • Alerting — Read Rob Ewaschuk’s “My Philosophy on Alerting

  • Dashboards — Prometheus is not intended for dashboards. Instead, hook Prometheus to a full-fledged dashboarding solution like Grafana to generate dashboards

Matt T. Proud and Julius Volz founded the Prometheus monitoring system and lead the project to success at SoundCloud and beyond. The initial release of Prometheus was on 24 November 2012; 7 years ago

Prometheus is a powerful, scalable, lightweight, and easy to use a monitoring tool that is indispensable for every user using cloud-based and container-based orchestration platforms. It has a vibrant set of developer community who has contributed to the growth of the toolset. Prometheus has a simple yet powerful data model and query language that helps in analyzing how your applications and systems are performing. Stay tuned on what is happening in the Prometheus land!

Thank you for reading! If you have enjoyed it, please comment on it!


I am a seasoned engineering leader with extensive experience building enterprise cloud and mobile platforms while knitting together high-performing engineering teams. I inspire engineers to get the trains to run on time. I’m passionate about travel, collaboration, and shaping new engineers through my activities as a blogger, speaker, and course author. I have been sharing my thoughts on life, engineering, and productivity 🚀

Website|Blog|LinkedIn