[rrd-developers] implementing portable format - change format?

Mon Nov 17 19:16:45 CET 2008

Ops, I forgot to attach the python script.

Igor

Sfiligoi Igor wrote:
> Hi Tobi.
> 
> I created a small python script (attached), and ran it on my laptop
> (Dell Latitude D810, a Pentium M 2GHz class machine).
> 
> For starters, the DB file stays constant, independently how many updates
> one makes (tried with 1M);
> 4096 bytes for 100 rows
> 25600 bytes for 2000 rows.
> 
> PS: I created the DB files with
>>>> import test1
>>>> test1.create_db('t8',100)
>>>> test1.create_db('t9',2000)
> 
> Running a simple open/update/close loop, I get ~9 updates per second:
>>>> import test1
>>>> test1.benchmark('t8',1000)
> For 1000 loops, 110.665 seconds
>>>> test1.benchmark('t8',1000)
> For 1000 loops, 109.262 seconds
>>>> test1.benchmark('t9',1000)
> For 1000 loops, 121.620 seconds
>>>> test1.benchmark('t9',1000)
> For 1000 loops, 119.479 seconds
> 
> The overhead seems to be due to the transaction management; updating 1,
> 10 or 100 rows per transactions gives almost the same timing:
>>>> import test1
>>>> test1.benchmark_multi('t8',1000,10)
> For 10000 updates (1000 loops 10 each), 123.724 seconds
>>>> test1.benchmark_multi('t8',1000,10)
> For 10000 updates (1000 loops 10 each), 108.126 seconds
>>>> test1.benchmark_multi('t8',1000,100)
> For 100000 updates (1000 loops 100 each), 111.148 seconds
>>>> test1.benchmark_multi('t8',1000,100)
> For 100000 updates (1000 loops 100 each), 111.108 seconds
>>>> test1.benchmark_multi('t9',1000,100)
> For 100000 updates (1000 loops 100 each), 124.660 seconds
>>>> test1.benchmark_multi('t9',1000,100)
> For 100000 updates (1000 loops 100 each), 124.104 seconds
> 
> Hope this results can help you.
> 
> Cheers,
>   Igor
> 
> PS: I also ran the tests in autocommit mode (which effectively doubles
> the number of transactions), and the run times were much longer, as
> expected:
>>>> import test1
>>>> test1.benchmark_notran('t8',1000)
> For 1000 loops, 210.374 seconds
>>>> test1.benchmark_notran('t8',1000)
> For 1000 loops, 218.828 seconds
>>>> test1.benchmark_notran('t9',1000)
> For 1000 loops, 233.482 seconds
> 
> 
> Tobias Oetiker wrote:
>> Igor,
>>
>> my understanding is that even in the UPDATE case it is not a simple
>> value replacement that happens ... but I would be most interested
>> in your tests, stability and performance of such a solution ... you
>> can do this with mockup configs without actually re-writing
>> rrdtool.
>>
>> cheers
>> tobi
>>
>> Today Sfiligoi Igor wrote:
>>
>>> Tobias Oetiker wrote:
>>>> Hi Igor,
>>>>
>>>> Yesterday Igor Sfiligoi wrote:
>>>>
>>>>> Hi Kevin and Tobi.
>>>>>
>>>>> Since you are planning to radically change the format, have you considered
>>>>> going a more "standard" way?
>>>>>
>>>>> Instead of having a completely RRDTool specific format, use something
>>>>> other tools could easily read?
>>>>>
>>>>> What about SQLite?
>>>>> It is already portable.
>>>>> And it is a database ;)
>>>>>
>>>>> It should be pretty easy to define a schema that serves RRDTool well.
>>>>>
>>>>> Would it be worth a consideration?
>>>>>
>>>>> Igor
>>>> yes the thought has crossed my mind, the problem with SQL databases
>>>> is that they are not realy good at updating round robin archives,
>>>> du to transactional consistancy constraints they will create
>>>> internal fragmentation which will cause performance to suffer
>>>> dramatically.
>>>>
>>>> BUT it would be an interesting thought to see, if we can abstract
>>>> the interface enough so that writing a 'plugin' for an sqlite
>>>> storage backend becomes easy ...
>>>>
>>>> cheers
>>>> tobi
>>> What if the table had a structure like this:
>>> index,date,value(1,value2,...)
>>>
>>> with index the table key.
>>> I.e. very similar to what you have now in the RRD file.
>>>
>>> One would populate all the rows with consecutive indexes and
>>> date=NULL,valueX=NULL at creation time.
>>> The updates would be SQL UPDATEs on the existing row, i.e. replacing
>>> date+valueX at a specific index.
>>>
>>> No INSERTs, no DELETEs.
>>> This should keep the database compact and of fixed size.
>>>
>>> Do you still see a fragmentation problem under this conditions?
>>>
>>> Cheers,
>>>   Igor
>>>
>>>
>>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test1.py.gz
Type: application/x-gzip
Size: 757 bytes
Desc: not available
Url : http://lists.oetiker.ch/pipermail/rrd-developers/attachments/20081117/70bd65a9/attachment-0001.bin