How to run Loki at scale on Kubernetes (Loki Community Call January 2025)
Happy New Year from the Loki Engineering team. To kick off 2025, Nicole and Jay will be joined by Poyzan Taneli from the Loki Engineering team to discuss how to run Loki at scale on Kubernetes. If you are currently running Loki in microservices mode or preparing to do so, we will be discussing best practices for scaling its components to meet the demands of production use cases.
- How many Ingesters vs Queriers do I need?
- What is the minimum cost of running Loki as a production cluster?
- What is the average node size for running Loki?
- How do I monitor my Loki deployment?
TIMESTAMPS:
00:00:00 Introductions
00:03:33 Why are the complexities in scaling Loki?
00:08:43 Factors to keep in mind when scaling Loki
00:17:20 Scaling ingest (write path)
00:22:00 Promtail deprecation in favour of Alloy
00:23:52 Scaling ingesters, distributors, compactors
00:29:21 Q: How can we find out the source of a heavy query?
00:30:30 Q: How can we move from SSD to distributed mode?
00:35:01 Scaling queriers
00:35:04 Q: Should we manually right-size clusters?
00:43:00 Q: How can improve query performance?
00:45:14 Loki Sizing Guide
00:51:07 Q: How can we reduce SlowDown errors from S3 when Loki writes to it?
00:58:36 Q: Have you tried to run Loki on ARM at scale?
00:59:20 Q: How do you tune the chunks cache?
01:01:51 Q: How can we monitor tenant usage and label cardinality?
01:05:09 Q: How can we minimise the cost of cross-AZ traffic for Loki?
Community Calls are monthly meetings that are open to everyone interested in the development of Loki. They are an opportunity for software engineers working on Loki to discuss new features as well as for open-source users of Loki to ask questions. To participate in the next Community Call, subscribe to the calendar here: https://gra.fan/lokicccal
HELPFUL LINKS:
Loki Community Call Agenda: https://gra.fan/lokicc
Loki GitHub repo: https://gra.fan/lokirepo
Loki docs: https://gra.fan/lokidocs
Submit to the GrafanaCON CFP or join us: https://gra.fan/con
(docs) Loki Sizing guidelines: https://gra.fan/lokisizing
(docs) Query best practices: https://gra.fan/lokiquerybp
(blog) How we scaled Grafana Cloud Logs’ memcached cluster to 50TB and improved reliability: https://gra.fan/lokimemcached
☁️ Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, and traces. Our forever-free tier includes access to 10k metrics, 50GB logs, 50GB traces and more. We also have plans for every use case. Sign up: https://grafana.com/get/
❓ Have a question that isn't related to this video? Check out the Official Grafana Community Forums and ask your question or find your answer: https://community.grafana.com/
👍 If you found this video useful, be sure to give it a thumbs up and subscribe to our channel for more helpful Grafana videos.
📱 Follow us for the latest and greatest on all things Grafana and our other OSS projects.
X: https://twitter.com/grafana
LinkedIn: https://www.linkedin.com/company/grafana-labs/mycompany
Facebook: https://www.facebook.com/grafana
#Grafana #Observability #loki #k8s #kubernetes #logs #deployment