[rrd-users] rrdcached problem complex RRD options on Solaris
Peter Jenkins
peter.jenkins at csc.fi
Mon Jul 19 15:16:20 CEST 2010
All,
I've found an issue with rrdcached on Solaris 10. Simple graphs are created fine
from the cache, the more complex ones with multiple source rrds fail.
I'm using version rrdtool 1.4.4 on Solaris 10. I found the issue from Ganglia,
but I've now recreated the issue on the command line.
In short, without cache it works:
$ /opt/CSCrrdtool/bin/rrdtool graph /tmp/foo.png --start '-3600' --end N --width
1024 --height 600 --title 'host Load last hour' --lower-limit 0 --vertical-label
'Load/Procs' --rigid
DEF:'load_one'='/opt/rrd/ganglia/Management/host/load_one.rrd':'sum':AVERAGE
DEF:'proc_run'='/opt/rrd/ganglia/Management/host/proc_run.rrd':'sum':AVERAGE
DEF:'cpu_num'='/opt/rrd/ganglia/Management/host/cpu_num.rrd':'sum':AVERAGE
AREA:'load_one'#CCCCCC:'1-min Load' LINE2:'cpu_num'#FF0000:'CPUs'
LINE2:'proc_run'#0000FF:'Running Processes'
1121x673
With cache it fails:
$ /opt/CSCrrdtool/bin/rrdtool graph /tmp/foo.png --daemon
unix:/tmp/rrdcached.socket --start '-3600' --end N --width 1024 --height 600
--title 'host Load last hour' --lower-limit 0 --vertical-label 'Load/Procs'
--rigid
DEF:'load_one'='/opt/rrd/ganglia/Management/host/load_one.rrd':'sum':AVERAGE
DEF:'proc_run'='/opt/rrd/ganglia/Management/host/proc_run.rrd':'sum':AVERAGE
DEF:'cpu_num'='/opt/rrd/ganglia/Management/host/cpu_num.rrd':'sum':AVERAGE
AREA:'load_one'#CCCCCC:'1-min Load' LINE2:'cpu_num'#FF0000:'CPUs'
LINE2:'proc_run'#0000FF:'Running Processes'
ERROR: rrdc_flush (/opt/rrd/ganglia/Management/host/proc_run.rrd) failed with
status -1.
This works fine under linux, see this ganglia-general thread:
http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg05775.html
Startup command:
# /opt/CSCrrdtool/bin/rrdcached -p /opt/rrd/rrdcache/rrdcached.pid -l
/tmp/rrdcached.socket -g
starting uplistening for connections
<no other output>
# ps -ef | grep rrdcached
rrdtool 1401 1 0 15:02:22 ? 0:06
/opt/CSCrrdtool/bin/rrdcached -p /opt/rrd/rrdcache/rrdcached.pid -l /tmp/rrdcac
# ./dtruss -a -p 1401
<snip>
1401/1: 6241 2024 140 accept(0x4, 0xFFBFFA98, 0xFFBFFA94)
= 6 0
1401/1: 6259 31 2 lwp_kill(0x12, 0x0, 0xFE2E3200)
= -1 Err#3
1401/1: 6327 84 55 lwp_create(0xFFBFF7F8, 0xC0, 0xFFBFF7F4)
= 19 0
1401/1: 6350 39 9 lwp_continue(0x13, 0x1, 0xFE2E3200)
= 0 0
1401/19: 48 1885 1 setcontext(0x3, 0xFE2E3288, 0x0)
= 0 0
1401/19: 63 31 5 schedctl(0xFE5374C0, 0x0, 0x0)
= -12607376 0
1401/19: 143 150043 64 pollsys(0xFDD7BF70, 0x1, 0xFDD7BF00)
= 1 0
1401/19: 171 47 17 read(0x6, "flush
/opt/rrd/ganglia/Management/shango/load_one.rrd\n\004\020\0", 0x2000)
= 54 0
1401/19: 178 26 1 gtime() = 1279543205 0
1401/5: 6354 1884 5 gtime() = 1279543205 0
1401/5: 6416 67 39
open("/opt/rrd/ganglia/Management/shango/load_one.rrd\0", 0x2, 0x1B6)
= 7 0
1401/5: 6428 31 6 fstat(0x7, 0xFE07BCF0, 0x0) = 0 0
1401/5: 6482 74 49 mmap(0x0, 0x2F70, 0x3) =
-17891328 0
1401/5: 6493 31 5 memcntl(0xFEEF0000, 0x2F70, 0x4)
= 0 0
1401/5: 6510 31 14 memcntl(0xFEEF0000, 0x78, 0x4)
= 0 0
1401/5: 6517 18 2 memcntl(0xFEEF0000, 0x78, 0x4)
= 0 0
1401/5: 6580 28 10 memcntl(0xFEEF0000, 0x78, 0x4)
= 0 0
1401/5: 6593 25 8 memcntl(0xFEEF0000, 0x230, 0x4)
= 0 0
1401/5: 6606 24 8 memcntl(0xFEEF0000, 0x8, 0x4)
= 0 0
1401/5: 6634 43 17 fcntl(0x7, 0x6, 0xFE07BD60) = 0 0
1401/5: 7389 26 5 memcntl(0xFEEF0000, 0x2F70, 0x1)
= 0 0
1401/5: 7488 120 94 munmap(0xFEEF0000, 0x2F70) = 0 0
1401/19: 270 19699 38 lwp_park(0x0, 0x0, 0x5) = 0 0
1401/19: 327 57 30 write(0x6, "0 Successfully flushed
/opt/rrd/ganglia/Management/shango/load_one.rrd.\n\0", 0x48) = 72 0
1401/5: 7773 15636 276 close(0x7) = 0 0
1401/5: 7809 37 11 lwp_park(0x1, 0x13, 0x0) = 0 0
1401/19: 374 898 33 pollsys(0xFDD7BF70, 0x1, 0xFDD7BF00)
= 1 0
1401/19: 394 29 12 read(0x6, "0 Successfully flushed
/opt/rrd/ganglia/Management/shango/load_one.rrd.\nflush
/opt/rrd/ganglia/Management/shango/proc_run.rrd\na\320\0", 0x2000)
= 126 0
1401/19: 400 17 0 gtime() = 1279543205 0
1401/19: 452 46 27 write(0x6, "-1 Unknown command: 0\n\0",
0x16) = 22 0
1401/5: 7855 1616 21 lwp_park(0x0, 0x0, 0x0) = 0 0
1401/5: 7868 16 0 gtime() = 1279543205 0
1401/5: 7916 47 31
open("/opt/rrd/ganglia/Management/shango/proc_run.rrd\0", 0x2, 0x1B6)
= 7 0
1401/5: 7927 21 5 fstat(0x7, 0xFE07BCF0, 0x0) = 0 0
1401/5: 7967 51 36 mmap(0x0, 0x2F70, 0x3) =
-17891328 0
1401/5: 7975 18 2 memcntl(0xFEEF0000, 0x2F70, 0x4)
= 0 0
1401/5: 7989 26 10 memcntl(0xFEEF0000, 0x78, 0x4)
= 0 0
1401/5: 7995 17 2 memcntl(0xFEEF0000, 0x78, 0x4)
= 0 0
1401/5: 8049 26 9 memcntl(0xFEEF0000, 0x78, 0x4)
= 0 0
1401/5: 8063 24 8 memcntl(0xFEEF0000, 0x230, 0x4)
= 0 0
1401/5: 8075 23 8 memcntl(0xFEEF0000, 0x8, 0x4)
= 0 0
1401/5: 8093 26 10 fcntl(0x7, 0x6, 0xFE07BD60) = 0 0
1401/5: 8767 21 4 memcntl(0xFEEF0000, 0x2F70, 0x1)
= 0 0
1401/5: 8824 69 52 munmap(0xFEEF0000, 0x2F70) = 0 0
1401/5: 8930 117 99 close(0x7) = 0 0
1401/5: 8961 28 10 lwp_park(0x1, 0x13, 0x0) = 0 0
1401/19: 510 2021 24 lwp_park(0x0, 0x0, 0x5) = 0 0
1401/19: 540 25 8 write(0x6, "0 Successfully flushed
/opt/rrd/ganglia/Management/shango/proc_run.rrd.\n\0", 0x48) = -1
Err#32
<snip>
This output seems a bit odd. Firstly I don't understand why lwp_kill is being
called, and having tried to read the code I'm none the wiser.
I used the errinfo DTrace script from the DTraceToolkit:
# ./errinfo -n rrdcached
EXEC SYSCALL ERR DESC
rrdcached lwp_kill 3 No such process
When calling rrdtool from ganglia using the cache I get some extra messages in
the rrdcached -g output:
send_response: could not write status message
Has anyone else seen this?
Thanks in advance,
Peter.
More information about the rrd-users
mailing list