воскресенье, 27 января 2019 г.

The current state of Graphite and its ecosystem

I opened the issue on Graphite project to discuss the future of Graphite project and found out that (my obviously opinionated) review for current ecosystem state is quite big, so I decided to put it separately - below.

The current state of Graphite and its ecosystem
1. Original Graphite https://graphiteapp.org
Language: Python
Storage: Whisper (Python)
Clustering capabilities: medium
Plus points:
* Current implementation standard.
* Still widely deployed, packaged by many distributions.
* Still working great for small to medium installations.
* Graphite-web is still a most full implementation of Graphite render protocol, most of the 3rd party storage implementations still using it as for render engine.
Minus points:
* Whisper storage: No compression, 12 bytes per point, very IO intensive
* Python: vertically scalable only by spawning more instances, which making scaling of relay and carbon components quite hard.
* Current clustering protocol of graphite-web is much better than in 0.9.x Graphite but still not working very well for big and/or volatile clusters.

2. Go-Graphite stack https://github.com/go-graphite/
Go-graphite is an effort to consolidate Golang re-implementations of different Graphite components, which were developed by Booking.com and other companies.
Language: Go / C
Storage: Whisper (Go)
Clustering capabilities: strong
Plus points:
* Go producing single binary per component, easily deployable and vertically  scalable
* New clustering protocol ("carbonserver") working much better in big clusters (Booking.com probably have biggest Graphite cluster in the world, based on that setup)
Minus points:
* Scattered components and development.
The project has no Golang-implemented relay yet, users should use 3rd party relays, e.g. carbon-c-relay or carbon-relay-ng.
The project has no storage component and using lomik's go-carbon, which currently have "carbonserver" built-in.
Carbonapi (graphite-web reimplementation) is not fully compatible with graphite-web and also currently forked in 2 separate forks - community fork and Booking.com fork.

3. Clickhouse stack
Clickhouse is an open-source analytic database, currently, open-sourced by Yandex. During internal development, it was used as Graphite storage, so it has some good implementation of Graphite parts inside (like aggregation). Yandex also open-sourced internal Java-based implementation of Graphite-compatible render part, named "Graphouse", but currently lomik's Golang reimplementation of components - carbon-clickhouse and graphite-clickhouse are much more popular. Please note, that this project contains no rendering components and will use Graphite-web or carbonapi for actual rendering.
Language: Go
Storage: Clickhouse (C++)
Clustering capabilities: strong
Plus points:
* Very good storage: low IO requirements, good compression (2-4 bytes per points typically)
* Can be used in small, medium and large installations - storage is scalable (despite lack of re-sharding, so, a bit like moving whisper files when extending cluster), other components are stateless go binaries
Minus points:
* Depends on Clickhouse's Graphite support - that's not the main purpose of Clickhouse, so, it theoretically can be removed or not-developed in future versions (but currently it's still there)
* User need to experiment with different storage schemas
* Extending big Clickhouse cluster currently can be painful (well, less painful then whisper, probably, I just mean can be not as smooth as e.g. Cassandra cluster).

"Yuuge" (Trump-voice) projects
We have currently 2 projects which were initially developed targeting big and very big Graphite installations - "Metrictank" and "Biggraphite"

4. Metrictank https://github.com/grafana/metrictank
Developed by Grafana Labs for supporting Grafana Cloud and WorldPing projects. A multitenant project aimed for big installations. I'm currently implemented MT cluster in my job, so, I'll describe it in a separate article.
Language: Go
Storage: Cassandra (Java) / BigTable (Google-cloud)
Clustering capabilities: strong
Plus points:
* Designed for scalability - all components are scalable, using Kafka as a bus for metric transport and clustering, using SWIM cluster for cache nodes
* Using strong caching layer for off-loading permanent storage, storing N hours of data in RAM cache for compression/deduplication.
* Re-implement some render functions in Golang, with proper fallback to Graphite-web
* Designed to run in containers (e.g. in Kubernetes)
* Good compression ratio for storage (also around 2-4 bpp)
Minus points:
* Cache nodes are quite RAM hungry and can go OOM (which require big overhead), especially during cluster start. Cache storage quite ineffective (comparable to storage) - 20-30 bytes per point (which is quite logical, the cache should be fast and not compact)
* Quite a complex system, you need to experiment with different deploy/setup strategies (well, that's probably true for every big and loaded storage)
* Not really useful in small installations (better to pick go-carbon or Clickhouse stack)

5. Biggraphite https://github.com/criteo/biggraphite
Designed by Criteo for own Graphite installation. Using Cassandra for extending storage, but reusing other components of Python stack.
Language: Python
Storage: Cassandra (Java) / Elasticsearch (Java)
Clustering capabilities: strong
Plus points:
* Scalable solution (you still need to scale python carbon instances, though)
Minus points:
* Big storage overhead (16-24 bytes per point)
* Not really useful in small installations (better to pick go-carbon or Clickhouse stack)

So, how I mentioned many times before, IMO Graphite is not only a project currently, but more like the whole ecosystem of projects, developed at a different time by different developers for different purposes. Not all of these projects are compatible with all features of the original project, but a user can (and should) pick up that or another implementation considering own use case, requirements, and implementation.

I'm planning to make separate writing about Metrictank and Clickhouse stacks soon.