[rrd-users] Data Mining: Correlation Engine
Martin Sperl
rrdtool at martin.sperl.org
Tue Nov 11 08:36:20 CET 2008
Hi!
Actually I have been talking to Tobi regarding this quite recently, as
this again came up during one of our projects.
An example question there was: What is the max number of transactions we
can hit on a specific server-hardware (correlating CPU usage and TPS).
This actually works quite well and we have been able to differentiate
between HW Generations quite easily...
So I have proposed to Tobi to contribute the following over the next few
month:
* creating a graphing facility with rrd to graph not time-series but a
scatter plot of 2 data-sources (also CDEFS should be able to act as
datasources!)
* simple VDEF functions to calculate some "simple" correlations (e.g:
linear/polynomial fits) and then use CDEFS to calculate+present this
graph...
I believe based on this one can write an easy framework for correlating
different data and then presenting it. Still IMHO the most important
thing is to have visualization for these to work - Actually my approach
is first to create a website that presents a matrix of correlation
graphs for different datasources. This way we can find out what is
significant visually...
But for me there next is the task of a adding a least squares fit engine
for polynomials and sums of sinuses to rrdtool, so that we can out of
the box create a prediction for the question: "from what we know now,
can we predict the value in 6 month time". This is actually much more
important to our performance-project to start from...
Ciao,
Martin
fcocquyt wrote:
> Hello,
> First off, big thanks to Tobias for creating RRDTool - the basis for a lot
> of great sysadmin'ing ;)
>
> I searched the forums without an answer - has anyone looked at a data mining
> engine for RRDTool data?
> An example application would be computing the correlation of different
> datasources in the set of all datasources (eg cacti installation).
> In his talk today in he outlined the roadmap, with the RRDcached the
> distributed model seems to be on its way - lending towards the (background
> compute) datamining approach...
> To my way of thinking much of the untapped value of RRDtool datasets rests
> in the analysis across the rrd files (eg wow, our online transactions
> (sales) drop off dramatically with our backend DB latency - if we upgrade
> our DB for {fixed cost} we can generate much more revenue).
>
> Anyone else see value in such a data mining engine for RRDTool?
>
> thanks,
> Fletcher
>
More information about the rrd-users
mailing list