Monitoring Hadoop Clusters using free toolsMonitorering av Hadoop cluster med gratis verktøy

Posted by admin on November 25, 2011 Leave a comment (0) Go to comments

Monitoring a Hadoop cluster properly can be a lot of work. Luckily there are lots of free tools with good documentation to get you started.
Here is a short list of what I use

Performance Monitoring

Cluster performance, Ganglia – Ganglia webpage
Individual servers graphing, Munin – Munin-webpage
pnp4Nagios will graph all Nagios-checks that support performance output – pnp4Nagios

Hardware and operating systems

Nagios -with default OS-plugins.

I like ‘check_by_ssh’ instead of nrpe, makes it easy, and works out of the box
Make sure you get hardwareplugins for your hardware, and generic disk, cpu, memory etc
Use check_process to check as many processes as possible, that you know should run
Use check_tcp to check all ports that should be open
Use check_ntp to make sure your cluster is in sync time-wise
See my separate page for more info about plugins etc
If your cluster is large, have a look at my Large scale implementation page

Hadoop itself

Nagios-plugins exists for at least some of the hadoop/hdfs-stuff.

Check the hdfs. I made a dirty perl-script, that parses the output from the namenodes web management page, can be found here
This check will check free/used DFS space in the cluster, and also if any nodes are dead, if blocks are missing or under-replicated. It also outputs performancedata for nice graphs using pnp4nagios.
I also use a little script that does a fsck / of hdfs, it is really simple, and can be found here. This can i.e. be run every 15/30/60 minutes, depending if you want more load, or more checking 🙂
Check tasktrackers, use this script.

Missing anything? Let me know in a comment 🙂

Monitoring a Hadoop cluster properly can be a lot of work. Luckily there are lots of free tools with good documentation to get you started.
Here is a short list of what I use

Performance Monitoring

Cluster performance, Ganglia – Ganglia webpage
Individual servers graphing, Munin – Munin-webpage
pnp4Nagios will graph all Nagios-checks that support performance output – pnp4Nagios

Hardware and operating systems

Nagios -with default OS-plugins.

I like ‘check_by_ssh’ instead of nrpe, makes it easy, and works out of the box
Make sure you get hardwareplugins for your hardware, and generic disk, cpu, memory etc
Use check_process to check as many processes as possible, that you know should run
Use check_tcp to check all ports that should be open
Use check_ntp to make sure your cluster is in sync time-wise
See my separate page for more info about plugins etc
If your cluster is large, have a look at my Large scale implementation page

Hadoop itself

Nagios-plugins exists for at least some of the hadoop/hdfs-stuff.

Check the hdfs. I made a dirty perl-script, that parses the output from the namenodes web management page, can be found here. This check will check free/used DFS space in the cluster, and also if any nodes are dead, if blocks are missing or under-replicated. It also outputs performancedata for nice graphs using pnp4nagios.
Check tasktracker status. I made this perl script, that checks tasktrackers from the tasktrackers admin-wep-page. Yet-another-parsing-of-html-script. It can be found here, and will check the status, and number of active machines. Supports performance output.
I also use a little script that does a fsck / of hdfs, it is really simple, and can be found here. This can i.e. be run every 15/30/60 minutes, depending if you want more load, or more checking 🙂

Missing anything? Let me know in a comment 🙂

artiklerhadoop, linux plugins, monitoring, Nagios, performance, pnp4nagios

← Nagios hadoop hdfs checkNagios hadoop hdfs sjekk

Nagios check for Apache Hadoop jobtrackerNagios check for Apache Hadoop jobtracker →

Runde Consult AS

Monitoring Hadoop Clusters using free toolsMonitorering av Hadoop cluster med gratis verktøy

Performance Monitoring

Hardware and operating systems

Hadoop itself

Missing anything? Let me know in a comment 🙂

Performance Monitoring

Hardware and operating systems

Hadoop itself

Missing anything? Let me know in a comment 🙂

0 Comments.

Leave a Comment Cancel reply

Pages

Recent Posts

Runde Consult AS

Monitoring Hadoop Clusters using free toolsMonitorering av Hadoop cluster med gratis verktøy

Performance Monitoring

Hardware and operating systems

Hadoop itself

Missing anything? Let me know in a comment 🙂

Performance Monitoring

Hardware and operating systems

Hadoop itself

Missing anything? Let me know in a comment 🙂

0 Comments.

Leave a Comment Cancel reply

Pages

Recent Posts

Tags