[rrd-developers] bus error when disk is full, with mmap & sparse file
Tobias Oetiker
tobi at oetiker.ch
Thu Apr 21 08:13:09 CEST 2011
Hi Francois,
Yesterday Francois-Xavier Bourlet wrote:
> Here's the patch.
>
> It only modify the mmap part of rrd_open and so you need to HAVE_MMAP.
>
> Before the patch, rrd_open open the file, mmap it, and then fill it
> (and so force the allocation of the file space) with a memset.
>
> If the filesystem get full before/during the memset, you're fine for a
> bus error.
>
> So this patch d fill up the file by consecutive writes before the
> mmap. Write()s errors can be easily caught so now rdd_open return an
> error when trying to fill the file.
>
> Example when starting a collectd (using librrd) on a (too) small tmpfs:
>
> before the patch:
>
> Initialization complete, entering read-loop.
> rrdtool plugin: Adjusting "RandomTimeout" to 0.000 seconds.
> ./test.sh: line 20: 6885 Bus error (core dumped)
> ./build/src/collectd -C collectd.conf -f
>
> After the patch:
>
> Initialization complete, entering read-loop.
> rrdtool plugin: Adjusting "RandomTimeout" to 0.000 seconds.
> rrdtool plugin: rrd_create_r
> (/home/bombela/dotcloud/collectd/rrd/bombela-laptop/lxc/lxc_containers.rrd)
> failed: creating
> '/home/bombela/dotcloud/collectd/rrd/bombela-laptop/lxc/lxc_containers.rrd':
> No space left on device
>
> let me know what do you think about it,
> Regards,
looks fine ... thanks
tobi
>
> On Tue, Apr 19, 2011 at 11:32 PM, Francois-Xavier Bourlet
> <francois-xavier.bourlet at dotcloud.com> wrote:
> > working on it
> >
> > On Mon, Apr 18, 2011 at 1:48 PM, Tobias Oetiker <tobi at oetiker.ch> wrote:
> >> Today Francois-Xavier Bourlet wrote:
> >>
> >>> When I was speaking about a SIGBUS handler, I was not really thinking
> >>> about something to recover from errors, but simply check if the SIGBUS
> >>> signal have a file descriptor associated with, and so check the free
> >>> space to print an hint to users. Like:
> >>>
> >>> Bus error (your disk seem full, the error could be a result of an
> >>> impossibility to allocate disk space for a file)
> >>>
> >>> I believe that the strongest way seem to writes zeros before mapping
> >>> the file, using plain old write()s, and so simply check write return
> >>> code / errno and make rrd_open return nicely and error.
> >>
> >> fine with me too ... lets see the patch ...
> >>
> >> cheers
> >> tobi
> >>
> >>>
> >>> On Mon, Apr 18, 2011 at 12:01 AM, Tobias Oetiker <tobi at oetiker.ch> wrote:
> >>> > Hi Francois,
> >>> >
> >>> > Yesterday Francois-Xavier Bourlet wrote:
> >>> >
> >>> >> Hi Tobi,
> >>> >>
> >>> >> Yes it happen at create time.
> >>> >>
> >>> >> Checking available free space before the creation process would lead
> >>> >> to some race condition, because between the time you check the free
> >>> >> space and the time you allocate it you can still have some others
> >>> >> process/thread allocating it.
> >>> >
> >>> > yes ...
> >>> >
> >>> >> But it could be used in another way, by setting up an handler for bus
> >>> >> error that check the free space and print a little hint message before
> >>> >> exiting the application? The advantage would be zero overhead (until
> >>> >> you crash... but do you really care at crash time ;) ) and no
> >>> >> modification of the current rrd_open function. What do you think?
> >>> >
> >>> > having a handler for sigbus sounds like a sensible idea ...
> >>> >
> >>> > http://www.linuxprogrammingblog.com/code-examples/SIGBUS-handling
> >>> >
> >>> > as for early vfsstat this could save time for people who try to
> >>> > create unreasonably large rrd files by alerting them before
> >>> > gigabytes of 0s have been allocated ...
> >>> >
> >>> > cheers
> >>> > tob
> >>> >
> >>> >> On Sun, Apr 17, 2011 at 10:07 PM, Tobias Oetiker <tobi at oetiker.ch> wrote:
> >>> >> > Hi Francois,
> >>> >> >
> >>> >> > Yesterday Francois-Xavier Bourlet wrote:
> >>> >> >
> >>> >> >> Hello,
> >>> >> >>
> >>> >> >> On my system rrd_open use mmap and my system support sparse file.
> >>> >> >> That's mean when my disk get full rdd_open can bus error. Here's the
> >>> >> >> scenario in rrd_open:
> >>> >> >>
> >>> >> >> Disk really close to full, few kbytes free
> >>> >> >> open file -> ok
> >>> >> >> seek to end -1 -> ok
> >>> >> >> write 1 -> ok
> >>> >> >> the system will only write the last chunk of the file, every others
> >>> >> >> will be allocated lazily later because of the sparse file feature.
> >>> >> >> So we have a file bigger than the free space available on the system.
> >>> >> >> Next attempt to write on this file, even without extending the size of
> >>> >> >> it will fail with a disk full error.
> >>> >> >>
> >>> >> >> next rrd_open map the file and then
> >>> >> >> memset to zero the whole file... leading to a buserror since the
> >>> >> >> kernel can't write into the file because the filesystem is full.
> >>> >> >
> >>> >> > this happens at create time, right ?
> >>> >> >
> >>> >> >> In my case I just have to extend the disk space available and it's
> >>> >> >> fine. But the problem is you don't have any clue that the bus error
> >>> >> >> happen because you're disk is full, and I really wasted a lots of time
> >>> >> >> before I thought simply checking the free space...
> >>> >> >>
> >>> >> >> I don't really now how to fix the code, maybe we can catch SIGBUS
> >>> >> >> signals, and when discovering that the error is about a file mapping,
> >>> >> >> provide an human readable message on terminal/log?
> >>> >> >>
> >>> >> >> Trying to recover from a bus error on file mapped memory seem to be
> >>> >> >> another challenge...
> >>> >> >>
> >>> >> >> Or rather than memsetting the file to zero, we could simply write
> >>> >> >> zeros in the file before mapping it, and so it would be easy to catch
> >>> >> >> write error.
> >>> >> >
> >>> >> >> Let me know what do you think about it, I am available to patch rrd
> >>> >> >> with the best proposed solution.
> >>> >> >
> >>> >> > how about a cal to statvfs before starting the whole creation
> >>> >> > process ? (for win32 this would bprobably be GetDiskFreeSpaceEx)
> >>> >> >
> >>> >> > cheers
> >>> >> > tobi
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >>
> >>> >> >> Regards,
> >>> >> >>
> >>> >> >
> >>> >> > --
> >>> >> > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
> >>> >> > http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900
> >>> >> >
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >
> >>> > --
> >>> > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
> >>> > http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900
> >>> >
> >>>
> >>>
> >>>
> >>>
> >>
> >> --
> >> Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
> >> http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900
> >>
> >
> >
> >
> > --
> > François-Xavier Bourlet
> >
>
--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900
More information about the rrd-developers
mailing list