[rrd-developers] bus error when disk is full, with mmap & sparse file

Francois-Xavier Bourlet francois-xavier.bourlet at dotcloud.com
Wed Apr 20 08:32:40 CEST 2011


working on it

On Mon, Apr 18, 2011 at 1:48 PM, Tobias Oetiker <tobi at oetiker.ch> wrote:
> Today Francois-Xavier Bourlet wrote:
>
>> When I was speaking about a SIGBUS handler, I was not really thinking
>> about something to recover from errors, but simply check if the SIGBUS
>> signal have a file descriptor associated with, and so check the free
>> space to print an hint to users. Like:
>>
>> Bus error (your disk seem full, the error could be a result of an
>> impossibility to allocate disk space for a file)
>>
>> I believe that the strongest way seem to writes zeros before mapping
>> the file, using plain old write()s, and so simply check write return
>> code / errno and make rrd_open return nicely and error.
>
> fine with me too ... lets see the patch ...
>
> cheers
> tobi
>
>>
>> On Mon, Apr 18, 2011 at 12:01 AM, Tobias Oetiker <tobi at oetiker.ch> wrote:
>> > Hi Francois,
>> >
>> > Yesterday Francois-Xavier Bourlet wrote:
>> >
>> >> Hi Tobi,
>> >>
>> >> Yes it happen at create time.
>> >>
>> >> Checking available free space before the creation process would lead
>> >> to some race condition, because between the time you check the free
>> >> space and the time you allocate it you can still have some others
>> >> process/thread allocating it.
>> >
>> > yes ...
>> >
>> >> But it could be used in another way, by setting up an handler for bus
>> >> error that check the free space and print a little hint message before
>> >> exiting the application? The advantage would be zero overhead (until
>> >> you crash... but do you really care at crash time ;) ) and no
>> >> modification of the current rrd_open function. What do you think?
>> >
>> > having a handler for sigbus sounds like a sensible idea ...
>> >
>> > http://www.linuxprogrammingblog.com/code-examples/SIGBUS-handling
>> >
>> > as for early vfsstat this could save time for people who try to
>> > create unreasonably large rrd files by alerting them before
>> > gigabytes of 0s have been allocated ...
>> >
>> > cheers
>> > tob
>> >
>> >> On Sun, Apr 17, 2011 at 10:07 PM, Tobias Oetiker <tobi at oetiker.ch> wrote:
>> >> > Hi Francois,
>> >> >
>> >> > Yesterday Francois-Xavier Bourlet wrote:
>> >> >
>> >> >> Hello,
>> >> >>
>> >> >> On my system rrd_open use mmap and my system support sparse file.
>> >> >> That's mean when my disk get full rdd_open can bus error. Here's the
>> >> >> scenario in rrd_open:
>> >> >>
>> >> >> Disk really close to full, few kbytes free
>> >> >> open file -> ok
>> >> >> seek to end -1 -> ok
>> >> >> write 1 -> ok
>> >> >> the system will only write the last chunk of the file, every others
>> >> >> will be allocated lazily later because of the sparse file feature.
>> >> >> So we have a file bigger than the free space available on the system.
>> >> >> Next attempt to write on this file, even without extending the size of
>> >> >> it will fail with a disk full error.
>> >> >>
>> >> >> next rrd_open map the file and then
>> >> >> memset to zero the whole file... leading to a buserror since the
>> >> >> kernel can't write into the file because the filesystem is full.
>> >> >
>> >> > this happens at create time, right ?
>> >> >
>> >> >> In my case I just have to extend the disk space available and it's
>> >> >> fine. But the problem is you don't have any clue that the bus error
>> >> >> happen because you're disk is full, and I really wasted a lots of time
>> >> >> before I thought simply checking the free space...
>> >> >>
>> >> >> I don't really now how to fix the code, maybe we can catch SIGBUS
>> >> >> signals, and when discovering that the error is about a file mapping,
>> >> >> provide an human readable message on terminal/log?
>> >> >>
>> >> >> Trying to recover from a bus error on file mapped memory seem to be
>> >> >> another challenge...
>> >> >>
>> >> >> Or rather than memsetting the file to zero, we could simply write
>> >> >> zeros in the file before mapping it, and so it would be easy to catch
>> >> >> write error.
>> >> >
>> >> >> Let me know what do you think about it, I am available to patch rrd
>> >> >> with the best proposed solution.
>> >> >
>> >> > how about a cal to statvfs before starting the whole creation
>> >> > process ? (for win32 this would bprobably be GetDiskFreeSpaceEx)
>> >> >
>> >> > cheers
>> >> > tobi
>> >> >
>> >> >
>> >> >
>> >> >>
>> >> >> Regards,
>> >> >>
>> >> >
>> >> > --
>> >> > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
>> >> > http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900
>> >> >
>> >>
>> >>
>> >>
>> >>
>> >
>> > --
>> > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
>> > http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900
>> >
>>
>>
>>
>>
>
> --
> Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
> http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900
>



-- 
François-Xavier Bourlet



More information about the rrd-developers mailing list