[rrd-users] rrdcached problem complex RRD options on Solaris

Peter Jenkins peter.jenkins at csc.fi
Mon Jul 19 15:16:20 CEST 2010


All,

I've found an issue with rrdcached on Solaris 10. Simple graphs are created fine 
from the cache, the more complex ones with multiple source rrds fail.

I'm using version rrdtool 1.4.4 on Solaris 10. I found the issue from Ganglia, 
but I've now recreated the issue on the command line.

In short, without cache it works:

$ /opt/CSCrrdtool/bin/rrdtool graph /tmp/foo.png --start '-3600' --end N --width 
1024 --height 600 --title 'host Load last hour' --lower-limit 0 --vertical-label 
'Load/Procs' --rigid 
DEF:'load_one'='/opt/rrd/ganglia/Management/host/load_one.rrd':'sum':AVERAGE 
DEF:'proc_run'='/opt/rrd/ganglia/Management/host/proc_run.rrd':'sum':AVERAGE 
DEF:'cpu_num'='/opt/rrd/ganglia/Management/host/cpu_num.rrd':'sum':AVERAGE 
AREA:'load_one'#CCCCCC:'1-min Load' LINE2:'cpu_num'#FF0000:'CPUs' 
LINE2:'proc_run'#0000FF:'Running Processes'
1121x673

With cache it fails:

$ /opt/CSCrrdtool/bin/rrdtool graph /tmp/foo.png --daemon 
unix:/tmp/rrdcached.socket --start '-3600' --end N --width 1024 --height 600 
--title 'host Load last hour' --lower-limit 0 --vertical-label 'Load/Procs' 
--rigid 
DEF:'load_one'='/opt/rrd/ganglia/Management/host/load_one.rrd':'sum':AVERAGE 
DEF:'proc_run'='/opt/rrd/ganglia/Management/host/proc_run.rrd':'sum':AVERAGE 
DEF:'cpu_num'='/opt/rrd/ganglia/Management/host/cpu_num.rrd':'sum':AVERAGE 
AREA:'load_one'#CCCCCC:'1-min Load' LINE2:'cpu_num'#FF0000:'CPUs' 
LINE2:'proc_run'#0000FF:'Running Processes'
ERROR: rrdc_flush (/opt/rrd/ganglia/Management/host/proc_run.rrd) failed with 
status -1.

This works fine under linux, see this ganglia-general thread:

http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg05775.html

Startup command:

# /opt/CSCrrdtool/bin/rrdcached -p /opt/rrd/rrdcache/rrdcached.pid -l 
/tmp/rrdcached.socket -g
starting uplistening for connections
<no other output>

# ps -ef | grep rrdcached
  rrdtool  1401     1   0 15:02:22 ?           0:06 
/opt/CSCrrdtool/bin/rrdcached -p /opt/rrd/rrdcache/rrdcached.pid -l /tmp/rrdcac

# ./dtruss -a -p 1401
<snip>
   1401/1:      6241    2024    140 accept(0x4, 0xFFBFFA98, 0xFFBFFA94) 
   = 6 0
   1401/1:      6259      31      2 lwp_kill(0x12, 0x0, 0xFE2E3200) 
   = -1 Err#3
   1401/1:      6327      84     55 lwp_create(0xFFBFF7F8, 0xC0, 0xFFBFF7F4) 
           = 19 0
   1401/1:      6350      39      9 lwp_continue(0x13, 0x1, 0xFE2E3200) 
   = 0 0
   1401/19:        48    1885      1 setcontext(0x3, 0xFE2E3288, 0x0) 
   = 0 0
   1401/19:        63      31      5 schedctl(0xFE5374C0, 0x0, 0x0) 
   = -12607376 0
   1401/19:       143  150043     64 pollsys(0xFDD7BF70, 0x1, 0xFDD7BF00) 
           = 1 0
   1401/19:       171      47     17 read(0x6, "flush 
/opt/rrd/ganglia/Management/shango/load_one.rrd\n\004\020\0", 0x2000) 
     = 54 0
   1401/19:       178      26      1 gtime()              = 1279543205 0
   1401/5:      6354    1884      5 gtime()               = 1279543205 0
   1401/5:      6416      67     39 
open("/opt/rrd/ganglia/Management/shango/load_one.rrd\0", 0x2, 0x1B6) 
       = 7 0
   1401/5:      6428      31      6 fstat(0x7, 0xFE07BCF0, 0x0)           = 0 0
   1401/5:      6482      74     49 mmap(0x0, 0x2F70, 0x3)                = 
-17891328 0
   1401/5:      6493      31      5 memcntl(0xFEEF0000, 0x2F70, 0x4) 
   = 0 0
   1401/5:      6510      31     14 memcntl(0xFEEF0000, 0x78, 0x4) 
   = 0 0
   1401/5:      6517      18      2 memcntl(0xFEEF0000, 0x78, 0x4) 
   = 0 0
   1401/5:      6580      28     10 memcntl(0xFEEF0000, 0x78, 0x4) 
   = 0 0
   1401/5:      6593      25      8 memcntl(0xFEEF0000, 0x230, 0x4) 
   = 0 0
   1401/5:      6606      24      8 memcntl(0xFEEF0000, 0x8, 0x4) 
   = 0 0
   1401/5:      6634      43     17 fcntl(0x7, 0x6, 0xFE07BD60)           = 0 0
   1401/5:      7389      26      5 memcntl(0xFEEF0000, 0x2F70, 0x1) 
   = 0 0
   1401/5:      7488     120     94 munmap(0xFEEF0000, 0x2F70)            = 0 0
   1401/19:       270   19699     38 lwp_park(0x0, 0x0, 0x5)              = 0 0
   1401/19:       327      57     30 write(0x6, "0 Successfully flushed 
/opt/rrd/ganglia/Management/shango/load_one.rrd.\n\0", 0x48)              = 72 0
   1401/5:      7773   15636    276 close(0x7)            = 0 0
   1401/5:      7809      37     11 lwp_park(0x1, 0x13, 0x0)              = 0 0
   1401/19:       374     898     33 pollsys(0xFDD7BF70, 0x1, 0xFDD7BF00) 
           = 1 0
   1401/19:       394      29     12 read(0x6, "0 Successfully flushed 
/opt/rrd/ganglia/Management/shango/load_one.rrd.\nflush 
/opt/rrd/ganglia/Management/shango/proc_run.rrd\na\320\0", 0x2000) 
    = 126 0
   1401/19:       400      17      0 gtime()              = 1279543205 0
   1401/19:       452      46     27 write(0x6, "-1 Unknown command: 0\n\0", 
0x16)                = 22 0
   1401/5:      7855    1616     21 lwp_park(0x0, 0x0, 0x0)               = 0 0
   1401/5:      7868      16      0 gtime()               = 1279543205 0
   1401/5:      7916      47     31 
open("/opt/rrd/ganglia/Management/shango/proc_run.rrd\0", 0x2, 0x1B6) 
       = 7 0
   1401/5:      7927      21      5 fstat(0x7, 0xFE07BCF0, 0x0)           = 0 0
   1401/5:      7967      51     36 mmap(0x0, 0x2F70, 0x3)                = 
-17891328 0
   1401/5:      7975      18      2 memcntl(0xFEEF0000, 0x2F70, 0x4) 
   = 0 0
   1401/5:      7989      26     10 memcntl(0xFEEF0000, 0x78, 0x4) 
   = 0 0
   1401/5:      7995      17      2 memcntl(0xFEEF0000, 0x78, 0x4) 
   = 0 0
   1401/5:      8049      26      9 memcntl(0xFEEF0000, 0x78, 0x4) 
   = 0 0
   1401/5:      8063      24      8 memcntl(0xFEEF0000, 0x230, 0x4) 
   = 0 0
   1401/5:      8075      23      8 memcntl(0xFEEF0000, 0x8, 0x4) 
   = 0 0
   1401/5:      8093      26     10 fcntl(0x7, 0x6, 0xFE07BD60)           = 0 0
   1401/5:      8767      21      4 memcntl(0xFEEF0000, 0x2F70, 0x1) 
   = 0 0
   1401/5:      8824      69     52 munmap(0xFEEF0000, 0x2F70)            = 0 0
   1401/5:      8930     117     99 close(0x7)            = 0 0
   1401/5:      8961      28     10 lwp_park(0x1, 0x13, 0x0)              = 0 0
   1401/19:       510    2021     24 lwp_park(0x0, 0x0, 0x5)              = 0 0
   1401/19:       540      25      8 write(0x6, "0 Successfully flushed 
/opt/rrd/ganglia/Management/shango/proc_run.rrd.\n\0", 0x48)              = -1 
Err#32
<snip>

This output seems a bit odd. Firstly I don't understand why lwp_kill is being 
called, and having tried to read the code I'm none the wiser.

I used the errinfo DTrace script from the DTraceToolkit:

# ./errinfo -n rrdcached
             EXEC          SYSCALL  ERR  DESC
        rrdcached         lwp_kill    3  No such process

When calling rrdtool from ganglia using the cache I get some extra messages in 
the rrdcached -g output:

send_response: could not write status message

Has anyone else seen this?

Thanks in advance,
Peter.



More information about the rrd-users mailing list