[rrd-developers] bus error when disk is full, with mmap & sparse file

Francois-Xavier Bourlet francois-xavier.bourlet at dotcloud.com
Wed Apr 20 22:21:12 CEST 2011


Here's the patch.

It only modify the mmap part of rrd_open and so you need to HAVE_MMAP.

Before the patch, rrd_open open the file, mmap it, and then fill it
(and so force the allocation of the file space) with a memset.

If the filesystem get full before/during the memset, you're fine for a
bus error.

So this patch d fill up the file by consecutive writes before the
mmap. Write()s errors can be easily caught so now rdd_open return an
error when trying to fill the file.

Example when starting a collectd (using librrd) on a (too) small tmpfs:

before the patch:

Initialization complete, entering read-loop.
rrdtool plugin: Adjusting "RandomTimeout" to 0.000 seconds.
./test.sh: line 20:  6885 Bus error               (core dumped)
./build/src/collectd -C collectd.conf -f

After the patch:

Initialization complete, entering read-loop.
rrdtool plugin: Adjusting "RandomTimeout" to 0.000 seconds.
rrdtool plugin: rrd_create_r
(/home/bombela/dotcloud/collectd/rrd/bombela-laptop/lxc/lxc_containers.rrd)
failed: creating
'/home/bombela/dotcloud/collectd/rrd/bombela-laptop/lxc/lxc_containers.rrd':
No space left on device

let me know what do you think about it,
Regards,

On Tue, Apr 19, 2011 at 11:32 PM, Francois-Xavier Bourlet
<francois-xavier.bourlet at dotcloud.com> wrote:
> working on it
>
> On Mon, Apr 18, 2011 at 1:48 PM, Tobias Oetiker <tobi at oetiker.ch> wrote:
>> Today Francois-Xavier Bourlet wrote:
>>
>>> When I was speaking about a SIGBUS handler, I was not really thinking
>>> about something to recover from errors, but simply check if the SIGBUS
>>> signal have a file descriptor associated with, and so check the free
>>> space to print an hint to users. Like:
>>>
>>> Bus error (your disk seem full, the error could be a result of an
>>> impossibility to allocate disk space for a file)
>>>
>>> I believe that the strongest way seem to writes zeros before mapping
>>> the file, using plain old write()s, and so simply check write return
>>> code / errno and make rrd_open return nicely and error.
>>
>> fine with me too ... lets see the patch ...
>>
>> cheers
>> tobi
>>
>>>
>>> On Mon, Apr 18, 2011 at 12:01 AM, Tobias Oetiker <tobi at oetiker.ch> wrote:
>>> > Hi Francois,
>>> >
>>> > Yesterday Francois-Xavier Bourlet wrote:
>>> >
>>> >> Hi Tobi,
>>> >>
>>> >> Yes it happen at create time.
>>> >>
>>> >> Checking available free space before the creation process would lead
>>> >> to some race condition, because between the time you check the free
>>> >> space and the time you allocate it you can still have some others
>>> >> process/thread allocating it.
>>> >
>>> > yes ...
>>> >
>>> >> But it could be used in another way, by setting up an handler for bus
>>> >> error that check the free space and print a little hint message before
>>> >> exiting the application? The advantage would be zero overhead (until
>>> >> you crash... but do you really care at crash time ;) ) and no
>>> >> modification of the current rrd_open function. What do you think?
>>> >
>>> > having a handler for sigbus sounds like a sensible idea ...
>>> >
>>> > http://www.linuxprogrammingblog.com/code-examples/SIGBUS-handling
>>> >
>>> > as for early vfsstat this could save time for people who try to
>>> > create unreasonably large rrd files by alerting them before
>>> > gigabytes of 0s have been allocated ...
>>> >
>>> > cheers
>>> > tob
>>> >
>>> >> On Sun, Apr 17, 2011 at 10:07 PM, Tobias Oetiker <tobi at oetiker.ch> wrote:
>>> >> > Hi Francois,
>>> >> >
>>> >> > Yesterday Francois-Xavier Bourlet wrote:
>>> >> >
>>> >> >> Hello,
>>> >> >>
>>> >> >> On my system rrd_open use mmap and my system support sparse file.
>>> >> >> That's mean when my disk get full rdd_open can bus error. Here's the
>>> >> >> scenario in rrd_open:
>>> >> >>
>>> >> >> Disk really close to full, few kbytes free
>>> >> >> open file -> ok
>>> >> >> seek to end -1 -> ok
>>> >> >> write 1 -> ok
>>> >> >> the system will only write the last chunk of the file, every others
>>> >> >> will be allocated lazily later because of the sparse file feature.
>>> >> >> So we have a file bigger than the free space available on the system.
>>> >> >> Next attempt to write on this file, even without extending the size of
>>> >> >> it will fail with a disk full error.
>>> >> >>
>>> >> >> next rrd_open map the file and then
>>> >> >> memset to zero the whole file... leading to a buserror since the
>>> >> >> kernel can't write into the file because the filesystem is full.
>>> >> >
>>> >> > this happens at create time, right ?
>>> >> >
>>> >> >> In my case I just have to extend the disk space available and it's
>>> >> >> fine. But the problem is you don't have any clue that the bus error
>>> >> >> happen because you're disk is full, and I really wasted a lots of time
>>> >> >> before I thought simply checking the free space...
>>> >> >>
>>> >> >> I don't really now how to fix the code, maybe we can catch SIGBUS
>>> >> >> signals, and when discovering that the error is about a file mapping,
>>> >> >> provide an human readable message on terminal/log?
>>> >> >>
>>> >> >> Trying to recover from a bus error on file mapped memory seem to be
>>> >> >> another challenge...
>>> >> >>
>>> >> >> Or rather than memsetting the file to zero, we could simply write
>>> >> >> zeros in the file before mapping it, and so it would be easy to catch
>>> >> >> write error.
>>> >> >
>>> >> >> Let me know what do you think about it, I am available to patch rrd
>>> >> >> with the best proposed solution.
>>> >> >
>>> >> > how about a cal to statvfs before starting the whole creation
>>> >> > process ? (for win32 this would bprobably be GetDiskFreeSpaceEx)
>>> >> >
>>> >> > cheers
>>> >> > tobi
>>> >> >
>>> >> >
>>> >> >
>>> >> >>
>>> >> >> Regards,
>>> >> >>
>>> >> >
>>> >> > --
>>> >> > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
>>> >> > http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >>
>>> >
>>> > --
>>> > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
>>> > http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900
>>> >
>>>
>>>
>>>
>>>
>>
>> --
>> Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
>> http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900
>>
>
>
>
> --
> François-Xavier Bourlet
>
-- 
François-Xavier Bourlet
http://www.dotcloud.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rrd_open_bus_error_when_disk_full_fix.diff
Type: application/octet-stream
Size: 1475 bytes
Desc: not available
Url : http://lists.oetiker.ch/pipermail/rrd-developers/attachments/20110420/bae5b4b4/attachment.obj 


More information about the rrd-developers mailing list