Tag Archives: Opsview

Nagios hadoop hdfs check

  • This is a test to check status of hdfs in a hadoop cluster
  • Existing checks did not give me what I needed, so I hacked this little script together. It is not pretty, but works for me. YMMV
  • The test uses the Hadoop hdfs admin web page, normally found at; http://hdfs-namenode:50070/dfshealth.jsp I am just parsing output from this webpage using regexps
  • New version, does not require links, Download Here  check_hadoop0.4
  • Rename to .pl and make executable
  • Requires the nagios perl modules, utils.pm and a few standard perl-modules
  • Tested with hadoop 0.20.2
  • Gives performancedata for unreplicated blocks, data in hdfs, nodes OK/Dead and number of files/directories/blocks in the hdfs.

check_hadoop_hdfs v. 0.4
Copyright (c) 2011 Jon Ottar Runde, jru@rundeconsult.no
See http://www.rundeconsult.no/?p=38 for updated versions and documentation
Usage: -w <warn> -c <crit> -x <Unreplicated blocks warn> -u <Unreplicated blocks crit> -H <Hostname> -p <Port> [-v version] [-h help]

Checks several Hadoop hdfs-parameters
-H (–Host)
-p (–Port)
-w (–warning)   = warning for DFS Usage
-c (–critical)  = critical limit for DFS Usage  (w < c )
-x (–unreplicatedwarn) = Warning limit for Unreplicated blocks
-u (–unreplicatedcritical) = Error limit for Unreplicated blocks
-h (–help)
-v (–version)

Example Nagios-config

define service{
use                          generic_service
service_description    Hadoop_Extended Check
check_command       check_hadoop_extended
hosts                       namenode.company.com

define command{
command_name        check_hadoop_extended
command_line        $USER1$/check_hadoop.pl -H namenode.company.com  -p 50070 -w 5 -c 10 -x 100 -u 1000