[rrd-developers] Possible race conditions / data loss in rrdcached?
kevin brintnall
kbrint at rufus.net
Sun Jul 12 17:23:19 CEST 2009
On Sun, Jul 12, 2009 at 12:15:54PM +0200, Sebastian Harl wrote:
> Hi,
>
> I've just stumbled across the following (possible) issue. Unfortunately,
> I don't have the time to have a more in-depth look at it, but maybe
> Kevin and / or Florian can comment on that:
Sebastian, your summary is correct.
This unlock allows rrdcached to continue processing other updates as soon
as possible. stat() may block on a system that is heavily IO constrained
(sometimes for multiple seconds). If the cache lock were held during that
time, we'd introduce the exact type of IO-bound update delay that
rrdcached is aiming to avoid.
With sub-second updates, the incoming update rate is even higher; thus,
the potential for a single stat() to delay updates is even higher.
For rrdcached to be beneficial, the arrival rate of new nodes must be
_significantly_ smaller than the arrival rate of updates. Therefore, I
think this is a sensible decision. If there is strong community feedback
to the contrary, we can re-evaluate.
--
kevin brintnall =~ /kbrint at rufus.net/
> In handle_request_update(), the following code is used:
>
> pthread_mutex_lock (&cache_lock);
> ci = g_tree_lookup (cache_tree, file);
>
> if (ci == NULL) /* {{{ */
> {
>
> /* I.e. the file is not yet available in the cache => we need to create
> * a cache item and insert in into the tree. */
>
> [...]
> /* don't hold the lock while we setup; stat(2) might block */
> pthread_mutex_unlock(&cache_lock);
>
> [...] /* Allocate and initialize the cache item. */
>
> pthread_mutex_lock(&cache_lock);
> g_tree_replace (cache_tree, (void *) ci->file, (void *) ci);
>
> Now, in the mean time, another "UPDATE" command could have been issued
> on the same file, which would have created the cache item as well which
> would then be replaced by the "new" one when calling g_tree_replace(),
> losing the data submitted by the other "UPDATE" command.
>
> g_tree_replace() will take care of properly freeing the "old" cache
> entry (using free_cache_item()), however that does not take care of
> flushing the file.
>
> I agree that this describes a very unlikely situation but it is
> semantically wrong and might result in problems that are very hard to
> trace back. Suppose that multiple updates are accumulated into one
> "UPDATE" command before being submitted to rrdcached (for whatever
> reason). Then all data of some UPDATE might be lost. When thinking about
> plans to introduce sub-second resolution some time in the future, this
> might become a real problem.
>
> Am I missing something? Any comments?
>
> Cheers,
> Sebastian
>
> --
> Sebastian "tokkee" Harl +++ GnuPG-ID: 0x8501C7FC +++ http://tokkee.org/
>
> Those who would give up Essential Liberty to purchase a little Temporary
> Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin
>
> _______________________________________________
> rrd-developers mailing list
> rrd-developers at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
More information about the rrd-developers
mailing list