Centreon - Open Source Network, Systems and Application monitoring solution Nagios - The Industry Standard in IT Infrastructure Monitoring etsy/statsd The OpenNMS Project Ganglia Monitoring System Monitoring at Spotify: The Story So Far | Labs This is the first in a two-part series about Monitoring at Spotify. In this, I’ll be discussing our history, the challenges we faced, and how they were approached. Operational monitoring at Spotify started its life as a combination of two systems. Zabbix and a homegrown RRD-backed graphing system named “sitemon”, which used Munin for collection. Zabbix was owned by our SRE team, while sitemon was run by our backend infrastructure team. Back then, We were small enough that solving things yourself was commonplace. In late 2013, we were starting to put more emphasis on self service and distributed operational responsibility. We tried to bandage up what we could: our Chief Architect hacked together an in-memory sitemon replacement that could hold roughly one month worth of metrics under the current load. Alerting as a service Alerting was the first problem we took a stab at. We considered developing Zabbix further. We built a library on top of Riemann called Lyceum. Graphing Data Hierarchies
RRDtool - About RRDtool What RRDtool does RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. RRDtool can be easily integrated in shell scripts, perl, python, ruby, lua or tcl applications. News For the latest news regarding RRDtool, check the Announcements Mailinglist Archive. Or add our Facebook and Google+ pages. Download RRDtool is available for download from this site. Sponsorship and Appreciators You like RRDtool?
Munin Monitoring at Spotify: Introducing Heroic | Labs This is the second part in a series about Monitoring at Spotify. In the previous post I discussed our history of operational monitoring. In this part I’ll be presenting Heroic, our scalable time series database which is now free software. Heroic is our in-house time series database. We built it to address the challenges we were facing with near real-time data collection and presentation at scale. At the core are two key pieces of technology are Cassandra, and Elasticsearch. We are aware Elasticsearch has a bad reputation for data safety, so we guard against total failures by having the ability to completely rebuild the index rapidly from our data pipeline or Cassandra. A key feature of Heroic is global federation. Every host in our infrastructure is running ffwd, which is an agent responsible for receiving and forwarding metrics. This setup allows us to rapidly experiment with our service topology. In the backend everything is stored exactly as it was provided to the agent.
Nagios - The Industry Standard in IT Infrastructure Monitoring