[rrd-developers] rrdcached + collectd issues

Tue Oct 13 17:16:04 CEST 2009

Am 13.10.2009 08:50, schrieb Thorsten von Eicken:
> Benny Baumann wrote:
>> Am 12.10.2009 20:33, schrieb Thorsten von Eicken:
>>   
>>> Thorsten von Eicken wrote:
>>>     
>>> One further thought, instead of trying to allocate RRDs sequentially,
>>> if there is a way to query/detect where each RRD file is allocated on
>>> disk, then rrdcached could sort the dirty tree nodes by disk location
>>> and write them in that order. I don't know whether Linux (or FreeBSD)
>>> have a way to query disk location or to at least infer it.
>>>
>>> TvE
>>>     
>> Even though Linux and Windows (and I guess most other OSes) allow to
>> query the "logical" disc position the physical location maybe completely
>> unrelated to this as modern harddrives mayreallocate certain sectors if
>> they feel that one particular sector cannot be read\written properly.
>> Thus trying to infer the physical location of data will not be accurate.
>> To take this even further not even the volume you write onto hast to
>> exist physically, e.g. you might have a RAID or LVM in which case one
>> logical sector corresponds to more than one physical location or as in
>> cases of a RAM disc none at all (at least no permanent one).
>>
>> To make the picture complete there's even another factor that makes this
>> nothing you want to do in software: Modern harddrives usually "plan"
>> their read and write requests automatically already. So when write
>> accesses occure right behind each other the harddrive will already
>> figure out the best way to write them - unless you enforce synchronous
>> request completion with e.g. O_DIRECT or thelike.
>>   
> Ben, I understand what you are saying. The question I have is not
> whether it's always possible and meaningful to do this, the question
> is whether it's possible to arrange this with the right set-up. Also,
> I wouldn't be looking for 100% accuracy, even if it was only roughly
> it could be a significant improvement.
> TvE
Well, I know what the intention to this is but IMHO getting even the
rough location of where the data is on the disk might turn out a
difficult process. The following parameters might have influence (I
tried to order them from low-level to high-level, YMMV):
- The type of storage used, like magnetic, flash, optical, ...
(Influence: The way data is accessed or modified)
- The encoding used to store the data (Influence: determines if two
pieces of data nearby might get reached)
- Physical organization of the media (Influence: Where is physical sector 0)
- Sector reallocation (Influence: Remapping of data locations)
- Logical structure of the media (Influence: Where is logical sector 0
and how to map from physical to logical sectors?)
- Virtual Resizing of the drive (e.g. hiding parts of the drive from
view) (Influence: Parts of the drive become invisible to the casual
observer)
- Partitioning of the media for the OS (Influence: Offset for sector
mapping)
- RAID / LVM (Influence: Virtualizing of the storage)
- Volume File System (Influence: Remapping of partition sectors to
logical file system sectors/clusters)
- File Allocation (Mapping a logical file offset to a volume cluster)

Taking all or at least some of these into account, humbly said, it
doesn't make sense to try guessing the disc location as you not only
will have to care about tousands of combinations of those factors, have
a lot of more, complex work, but certainly will guess wrong most of the
time. And guessing wrong will in buest case change nothing, but in worst
case will harm your performance twice: Once for guessing at all and
second for guessing wrong.

The better alternative IMHO should be analyzing where things waste
valueable time. I'm not sure about the internals of the write threads,
but usually you have one "task" thread preparing a queue of stuff to
write (or multiple if needed) and many (I'll come to this in a second)
worker threads to flush things to disc. Now for the number: 2 threads
can be better than ten, but needn't be as this depends on some things:
1. the number of CPUs in the system, 2. the drive speed, 3. data buffering.

To get a feeling for this: Consider compiling a large program or
software on a multi core CPU: You'll not get anything from your hardware
when working with one make instance. That's for sure. Two will be
better, but needn't be optimum. Although Compiling isn't as I/O-bound as
flushing data to disk, there's a point where the number of processes
that read data from disk (for compiling) need just as much time as
things need to compile. That's why for compiling you won't do one job
per CPU but usually 2 or 3: One process per CPU does I/O, the other does
CPU work.

So to take this for the I/O-Heavy work: Do two threads per CPU (one
prepares the data, the other writes) and encourage the OS to use much
buffer space in RAM, i.e. avoid using direct disc I/O with O:_DIRECT as
O_DIRECT forces the OS to wait for the operation to complete. Instead
use as many asynchronous calls and do e.g. select calls to find which
succeeded and which failed. So one I/O thread can track say 16* I/O
operations at a time, discard those that completed successfully, retry
those that failed and leave the kernel working for you. One way to
optimize things here is, to collect requests for one file per thread,
i.e. if there are 2 requests on descriptor 7 push them to the same
thread if possible instead of distributing them among threads. But since
async threads cannot be paralleld on the same file (at least on Windows
IIRC) you'd need to do something like this anyway. This saves some
trouble with synching your threads with using that file descriptor and
thus improves performance a bit.

I hope this little description helps a bit with finding a suitable solution.

*arbitrarily choosen, no empiric data

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.oetiker.ch/pipermail/rrd-developers/attachments/20091013/907e63cf/attachment-0001.pgp