Further Reading

Install Loki with Docker

Tinkering with Loki, Promtail, Grafana, Prometheus, Nginx and Dnsmasq

Using Grafana to visualise syslog files with Loki | by The Mightywomble Tech Blog | Medium

Promtail example extracting data from json log

Pipelines | Grafana Labs

How To Install Docker and Docker-Compose On Raspberry Pi - DEV Community

opentelemetry-go/main.go at main · open-telemetry/opentelemetry-go

Centralized Container Logging with Fluent Bit | AWS Open Source Blog

Azure Monitor agent overview - Azure Monitor | Microsoft Docs

Collecting metrics and logs from Amazon EC2 instances and on-premises servers with the CloudWatch agent - Amazon CloudWatch

Google Cloud operations suite agents  |  Cloud Monitoring

How to Log a Log: Application Logging Best Practices | Logz.io

ECR Public Gallery - ho11y

Monitorama BAL 2019 - Bryan Liles - Traces, metrics, endpoints - The what, why, and how on Vimeo

Why Tracing Might Replace (Almost) All Logging | by Ben Sigelman | LightstepHQ | Medium

Instrumentation | Prometheus

Guidelines for developers on how to implement new metrics · Issue #18 · cncf/tag-observability

community/instrumentation.md at master · kubernetes/community

Collector | OpenTelemetry

aws-observability/aws-otel-collector: AWS Distro for OpenTelemetry Collector

grafana/grafana: The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

promlabs/promlens-public: Public repository for PromLens documentation, issues, bugs, and feature requests

bomb-squad-containing-the-cardinality-explosion.pdf

Revisiting PromCon 2018 Panel On Prometheus Long-Term Storage - DEV Community

joe-elliott/tracing-example

On The State Of Continuous Profiling | by Michael Hausenblas | Dec, 2021 | Medium

o11yfest - Sampling in Distributed Tracing - Speaker Deck

tracing-example/docker-compose.yaml at master · joe-elliott/tracing-example

tempo/docker-compose.yaml at main · grafana/tempo

tempo/example/docker-compose at main · grafana/tempo

tarekziade/salvo: Like Boom, but based on Molotov

Intro to API Load Testing: The k6 Guide

go-profiler-notes/guide at main · DataDog/go-profiler-notes

opensearch-project/data-prepper: Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.

Q6sQ | explain.depesz.com

8NW3 | explain.depesz.com

Home | OpenLineage

Backstage Software Catalog and Developer Platform · An open platform for building developer portals

The origin of Service Level Objectives | Last9 SRE Platform

Saving on AWS Lambda Amazon CloudWatch Logs costs - DEV Community

How to monitor Docker metrics using Prometheus & Grafana? | by Dhruvin Soni | Geek Culture | Mar, 2022 | Medium

Cloud Native Observability Microsurvey: Prometheus leads the way, but hurdles remain to understanding the health of systems | Cloud Native Computing Foundation

Server Monitoring & Logs - MariaDB Knowledge Base

Logging in Action

How to Set up Log Forwarding in a Kubernetes Cluster Using Fluent Bit - DEV Community

Top 6 Database Performance Metrics to Monitor in Enterprise Applications | Blog | AppDynamics

What Is Log Aggregation: 101 Guide to Best Tools & Practices - Sematext

CrunchyData/pgmonitor: PostgreSQL Monitoring, Metrics Collection and Alerting Resources from Crunchy Data

Monitor Kubernetes API Server Audit in EKS | by Nick Gibbon | Pareture | Mar, 2022 | Medium

Observability Best Practices when running FastAPI in a Lambda - DEV Community

Metrics For Your Web Application\'s Dashboards

Thoughtworks Misses the Mark --- Re: Serverless vs. Kubernetes | by Charles Chen | Medium

Who is the winner --- Comparing Vector, Fluent Bit, Fluentd performance | by Ajay Gupta | IBM Cloud | Medium

keyval-dev/opentelemetry-go-instrumentation: OpenTelemetry auto-instrumentation for Go applications

Querying logs just got easier in Cloud Logging | Google Cloud Blog

Key Metrics for PostgreSQL Monitoring | Datadog

Observability, Part 1: Intro & Postgres-Exporter | by Tyler Owen | ITNEXT

Query Planning

Monitoring Elasticsearch with Sematext - DEV Community

How to monitor Redis with Prometheus -- Sysdig

Top DynamoDB Performance Metrics | Datadog

Getting Started with OpenTelemetry for Java -- The New Stack

RFC 3530 - Network File System (NFS) version 4 Protocol

Google - Site Reliability Engineering

opentelemetry-collector-contrib/receiver/hostmetricsreceiver at main · open-telemetry/opentelemetry-collector-contrib

oteps/0119-standard-system-metrics.md at main · open-telemetry/oteps

Survey Review: Key Challenges of Scaling Observability with Cloud Workloads | Logz.io

Collecting DORA Metrics from GitHub, AWS ECR and AWS ECS | by Furkan Taşbaşı | Picus Security Engineering | Medium

Improving Code Design With OpenTelemetry --- A Practical Guide | by Roni Dover | May, 2022 | Better Programming

loggie-io/loggie: A lightweight, cloud-native data transfer agent and aggregator

The Importance of Structured Logging In AWS (and Anywhere Else) | by Connor Butch | May, 2022 | Medium

Observability: What to instrument? - Chris Armstrong

fluent-bit-docs/aws-credentials.md at 43c4fe134611da471e706b0edb2f9acd7cdfdbc3 · fluent/fluent-bit-docs

Amazon CloudWatch - Fluent Bit: Official Manual

Docker - OpenSearch documentation

Agent Configuration | OpenTelemetry

Auto-Instrumenting Node.js JavaScript Apps with OpenTelemetry | Logz.io

Capturing logs at scale with Fluent Bit and Amazon EKS | Containers

opentelemetry-collector/README.md at main · open-telemetry/opentelemetry-collector

What is observability | Ubuntu

End-to-end web performance monitoring with Prometheus S01E01 | by Nicolas STEFANI | Aug, 2022 | Medium

Overview of Grafana Alerting and Message Templating for Slack | by Tanmay Bhat | Aug, 2022 | FAUN Publication

Promscale: the observability backend powered by SQL | Timescale

Compare Row vs Column Oriented Databases

Fundamentals of Data Observability

influxdata/influxdb: Scalable datastore for metrics, events, and real-time analytics

OpenTSDB/opentsdb: A scalable, distributed Time Series Database.

Designing Data-Intensive Applications (DDIA) --- an O'Reilly book by Martin Kleppmann (The Wild Boar Book)

VictoriaMetrics/VictoriaMetrics: VictoriaMetrics: fast, cost-effective monitoring solution and time series database

jaegertracing/jaeger: CNCF Jaeger, a Distributed Tracing Platform

OpenZipkin · A distributed tracing system

Observability for Distributed Services | Honeycomb

The cloud-native reliability platform | Lightstep

Monitor, Debug and Improve Your Entire Stack

Cloud Log Management, Monitoring, SIEM Tools | Sumo Logic

How to Supercharge Automatic Tracing with Kubernetes Primitives - Instana

instana/instana-autotrace-webhook

Clymene-project

Observability Platform Built for Modern R&D Teams | Aspecto

Logz.io: Cloud Observability for Engineers

Lumigo - Serverless Monitoring and Troubleshooting Platform

High-cardinality TSDB benchmarks: VictoriaMetrics vs TimescaleDB vs InfluxDB | by Aliaksandr Valialkin | Medium

Take back control of observability - Chronosphere

Monitor Azure services and applications using Grafana - Azure Monitor | Microsoft Docs

Get started with managed collection  |  Operations Suite  |  Google Cloud

Thanos - Highly available Prometheus setup with long term storage capabilities

Cortex

Grafana Mimir | Grafana Labs

FOSDEM 2020 - Querying millions to billions of metrics with M3DB\'s inverted index

AWS X-Ray -- Distributed Tracing System

Distributed tracing in Azure Application Insights - Azure Monitor | Microsoft Docs

Go and OpenTelemetry  |  Cloud Trace  |  Google Cloud

Cloud Data Warehouse -- Amazon Redshift -- Amazon Web Services

Presto | Distributed SQL Query Engine for Big Data

Fast Open-Source OLAP DBMS - ClickHouse

A Data Architecture Built for the Cloud | Snowflake Data Cloud

2 What is Zabbix

nightingale/README_EN.md at main · ccfos/nightingale

Nagios - Network, Server and Log Monitoring Software

Icinga » Monitor your entire Infrastructure with Icinga

Netdata: Monitoring and troubleshooting transformed - Netdata

What is Netdata? | Learn Netdata

About - Icinga 2

zabbix/zabbix: Real-time monitoring of IT components and services, such as networks, servers, VMs, applications and the cloud.

Icinga/icinga2: The core of our monitoring platform with a powerful configuration language and REST API.

bluestreak01/questdb: High Performance Time Series Database

What Is Time-Series Data, and Why Do I Need a Time-Series Database?

apache/skywalking: APM, Application Performance Monitoring System

CNCF Cloud Native Interactive Landscape

Overview | Elastic Common Schema (ECS) Reference [8.3] | Elastic

LogQL | Grafana Loki documentation

RFC 5424 - The Syslog Protocol

google/pprof: pprof is a tool for visualization and analysis of profiling data

SOC 2 Compliance Requirements | Secureframe

Flux

Altinity/clickhouse-operator: The Altinity Operator for ClickHouse creates, configures and manages ClickHouse clusters running on Kubernetes

ClickHouse Networking, Part 1 -- Altinity | The Real Time Data Company

opentelemetry-collector-contrib/README.md at main · open-telemetry/opentelemetry-collector-contrib

opentelemetry-collector-contrib/README.md at main · open-telemetry/opentelemetry-collector-contrib

Integrating Compression and Execution in Column-Oriented Database Systems

Inside Capacitor, BigQuery's next-generation columnar storage format | Google Cloud Blog

MergeTree | ClickHouse Docs

The Data Processing Holy Grail? Row vs. Columnar Databases | Jscrambler Blog

C-Store: A Column-oriented DBMS

FB Gorilla paper 2015

Sample vs Metrics vs Cardinality | Last9 SRE Platform

clickhouse-operator/quick_start.md at master · Altinity/clickhouse-operator

oteps/0212-profiling-vision.md at 81bb5cc1299a844e5f1c883935048657eec11f46 · open-telemetry/oteps

ClickCat --- A ClickHouse Visual Interface | by wangqinghuan | Aug, 2022 | Medium

jaeger/plugin/storage/grpc at main · jaegertracing/jaeger

jaegertracing/jaeger-clickhouse: Jaeger ClickHouse storage plugin implementation

opentelemetry-collector/README.md at main · open-telemetry/opentelemetry-collector

Grafana Tempo | Grafana Labs

metabase/metabase: The simplest, fastest way to get business intelligence and analytics to everyone in your company

Welcome | Superset

Business Intelligence and Analytics Software

Amazon QuickSight - Business Intelligence Service - Amazon Web Services

Data Visualization | Microsoft Power BI

What is Amazon QuickSight? - Amazon QuickSight

K8s Infra Metrics | SigNoz

Instana - Enterprise Observability and APM for Cloud-Native Applications

Cloud Monitoring as a Service | Datadog

Data sources | Grafana documentation

Grafana Plugins - extend and customize your Grafana | Grafana Labs

grafana/terraform-provider-grafana: Terraform Grafana provider

Feature Deep Dive: OpenSearch Dashboards Notebooks · OpenSearch

1.0 is released! · OpenSearch

OpenSearch Tutorial: Getting Started with Install and Configuration | Logz.io

Elasticsearch in Action, Second Edition

Kibana Tutorial: Getting Started | Logz.io

Quick start | Kibana Guide [8.4] | Elastic

About Dashboards - OpenSearch documentation

Kibana: Explore, Visualize, Discover Data | Elastic

Adding and managing databases

Google Data Studio Overview

Amazon QuickSight Connection examples - Amazon QuickSight

Data sources for Power BI - Power BI | Microsoft Docs

Data Management | Discover, understand, connect, and trust your data

OpenZipkin\'s 5 year anniversary · openzipkin/openzipkin.github.io Wiki

zipkin/docker/examples at master · openzipkin/zipkin

Rookout | Painless Cloud-Native Debugging

Calculating the differential cost of code changes - Amazon Science

Comparison of the Open Source OLAP Systems for Big Data: ClickHouse, Druid, and Pinot | by Roman Leventov | Medium

Introduction - Apache Pinot Docs

Observability in Practice | Kamon

Apache SkyWalking

Application Performance Monitor and Distributed Tracing with Apache SkyWalking in Datahub | by Liangjun Jiang | Medium

liangjun-jiang/distributed-tracing-in-datahub-with-skywalking: A step by step guide to integrate Apache Skywalking with LinkedIn\'s Datahub for application performance and monitoring purpose

Launching support for ClickHouse as storage backend for SigNoz | SigNoz

Monitoring Tools. What are Monitoring Tools? | by Arsal Ur Rehman | Sep, 2022 | Medium

eBay/flow-telemetry: Adding observability to feat...

Apache Druid: overview, running in Kubernetes and monitoring with Prometheus | by Arseny Zinchenko (setevoy) | Sep, 2022 | ITNEXT

From Critical User Journey to SLO/SLIs | by Adam Roberts | Sep, 2022 | Medium

Synthetic Monitoring

Uptime and synthetic monitoring | Observability Guide [master] | Elastic

Synthetic monitoring using AWS Canary & Cloudwatch - DEV Community

Docker Compose Prometheus AlertManager Grafana - Carlos Aguni Personal Blog

Alerting rules | Prometheus

PromCon 2018: Life of an Alert - YouTube

Proposal: Supporting Real User Monitoring Events in OpenTelemetry · Issue #169 · open-telemetry/oteps

Thanos - Highly available Prometheus setup with long term storage capabilities

Configuring Notification using Cortex Alertmanager | Cortex

Distributed Tracing in 2025: What the future holds --- keyval

All Things Clock, Time and Order in Distributed Systems: Physical Time in Depth | by Kousik Nath | Geek Culture | Medium

Observability with OpenTelemetry Part 5 - Propagation and Baggage | Thomas Stringer

Cloud-Native Observability with OpenTelemetry | Packt

OSA Con 2022: Signal Correlation, the Ho11y Grail | Developer Conference - YouTube

gProfiler - Granulate

Shift-Left: A Developer\'s Pipe(line) Dream? - DZone

What Is eBPF?

Proposal: Adding profiling as a support event type · Issue #139 · open-telemetry/oteps

Continuous Profiler | Datadog

prodfiler-documentation | Documentation for Prodfiler, the distributed lightweight continuous whole-system profiler

Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers -- Google Research

Profiling and optimization | Dynatrace Docs

Thread profiler tool | New Relic Documentation

Real-time profiling for Java using JFR metrics | New Relic Documentation

Grafana Phlare documentation | Grafana Phlare documentation

dotTrace Profiler: .NET Profiling Experience Like No Other by JetBrains

BPF binaries: BTF, CO-RE, and the future of BPF perf tools

BPF CO-RE (Compile Once -- Run Everywhere)

polarsignals/frostdb: ❄️ Coolest database around Embeddable column database written in Go.

Time-series compression algorithms, explained

Protocol Buffers Documentation

Nobl9 SLO Software and Tools to Manage Service Level Objectives, Error Budgeting, and Monitoring

SLO Development Lifecycle | SLODLC

community/slos.md at master · kubernetes/community · GitHub

SLO-Based Observability For All Kubernetes Cluster Components - Matthias Loibl & Nadine Vehling - YouTube

SLA vs SLO vs SLI: Similarities and Differences - DZone

Google - Site Reliability Engineering

The first Service Level Objective Conference for Site Reliability Engineers

SLI Analyzer | Nobl9 Documentation

SLOs | Honeycomb

Concepts in service monitoring  |  Operations Suite  |  Google Cloud

lindb/lindb: LinDB is a scalable, high performance, high availability distributed time series database.

The RED Method: A New Approach to Monitoring Microservices - The New Stack

Implementing Service Level Objectives

Service Level Objectives

Get started with New Relic service levels | New Relic Documentation

SLI/SLO Monitoring | Splunk

Service-level objectives | Dynatrace Docs

Kubernetes deployment | OpenTelemetry

open-telemetry/opentelemetry-demo: This repository contains the OpenTelemetry Astronomy Shop, a microservice-based distributed system intended to illustrate the implementation of OpenTelemetry in a near real-world environment.

Demo Screenshots | OpenTelemetry

opentelemetry-demo/otelcol-config.yml at main · open-telemetry/opentelemetry-demo · GitHub

bufbuild/buf: A new way of working with Protocol Buffers.

spluxx/Protoman: Postman for protobuf APIs

Feature Request : Support for google protobuf · Issue #2801 · postmanlabs/postman-app-support

Protocol Buffer Compiler Installation | gRPC

kind

Deploy on Kubernetes

K3s

SLOconf: SLO Math - by Steve McGhee - YouTube

[2303.13402] Return on Investment Driven Observability

Observability---a data engineering challenge - Michael Hausenblas - YouTube

Correlating Signals Efficiently in Modern Observability | \@bwplotka

Prometheus data source | Grafana documentation

Distributed tracing and telemetry correlation in Azure Application Insights - Azure Monitor | Microsoft Learn

Correlate log entries  |  Cloud Logging  |  Google Cloud

Announcing Correlations: The fastest way to investigate issues | Lightstep blog

Metric Correlations

Correlate Request Logs With Traces Automatically | Datadog

Correlating distributed traces of large scale systems | Dynatrace Engineering

APM Tools Comparison | Honeycomb

Configure correlation logic with decisions | New Relic Documentation

correlate - Splunk Documentation

Custom Correlation - appdynamics

Cloud Scale Correlation and Investigation with Cloud SIEM | Sumo Logic

Logz.io Docs | Correlate logs and traces