[rrd-developers] mmap considerations
Bernhard Fischer
rep.dot.nop at gmail.com
Wed Jun 13 11:59:23 CEST 2007
On Wed, Jun 13, 2007 at 11:03:45AM +0200, Tobias Oetiker wrote:
>Hi Bernhard,
>
>the time bit is trecherouse ... I HAVE todo other stuff ... alas
>... but I sneek out every now and then ...
sounds familiar..
>the reason rrd_create is 'critical' is not performance as such, but
>cache pollution ... by creating just a few rrd files cache gets
>thrown out of whak quite badly .. I have been running my
>performance.pl script which creates a bunch of rrd files and then
>updates them ... I found that the memory system takes quite a long
>time to figure which data it should keep and which it should drop
>... a single rrd_create can have quite a long lasting negative
>effect ...
ok. I will switch rrd_create to the new accessors, sounds good?
>
>I assume the kernel prefers keeping a whole file in memory as to
>just a few scattered blocks in many files ... (which makes sense in
>general, just not for rrdtool)
>
>the test case shows that the mmap code goes to about 20k updates
>per second when stuff is in cache while 1.2.24dev goes to 12k
>updates per second ...
Yes, this is about the same figure that i was seeing. rrdtool-1.3 is
about 80% faster for e.g. updates than rrdtool-1.2. Worst-case speedup
was around 30%. I consider this a feature :)
>
>I have not quite figured out the logic behind it all ...
>
>if you care to play as well, I have attached my testing code ...
I'll have a look if time permits (perhaps on sunday, we'll see).
>regarding the graphing part, the only bit where it accesses the
>filesystem directly is when writing out the png/pdf/svn/... file
>the rest of the interaction goes through rrd_fetch ...
>
>not sure about dropping the png file ... this only makes sense for
>people who create many pngs and not on demand ...
I assume that the created picture will soon be delivered through a
network-pipe to a browser on the client-side, at least usually. Thus
keeping it fully in cache should be benefical, i think.
>
>dropping rrd_fetch (except the hot blocks) does make sense in my opinnion
>since it stops cache pollution ... should be configurable for cases
>where the user knows that he is going to re-read the same rrd file
>twice in a row ...
I will look at the call-graph of rrd_graph. I suppose a command-line
option (--keep, -k or the like) to keep the whole RRD in hot cache
can be implemented for rrd_fetch() and rrd_graph(). Sounds ok?
cheers,
Bernhard
--
>
>
>Today Bernhard Fischer wrote:
>
>> On Tue, Jun 12, 2007 at 11:20:39PM +0200, Tobias Oetiker wrote:
>> >Hi Bernhard,
>> [I'm changing the subject a little bit; Please feel free to move this
>> discussion to the list, if you prefer. Discussing this a little bit in
>> private mode is of course fine with me. Nice that you seem to have a
>> little bit time to work again on rrdtool :)]
>> >
>> >looking at the new code, I find that configure does not check for
>> >posix_fadvise if mmap code is active ...
>> >
>> >the effect is that in rrd_create no fdatasync and dontneed happens,
>> >which makes rrd_create faster but also evicts all previously cached
>> >data quite effectively ...
>> >
>> >is there a reason to not check for posixs fadvise if mmaping is
>> >active ?
>>
>> My way of thinking is that for mmap we generally do not want to fadvise
>> but madvise, where possible. It is correct that rrd_create in it's
>> current incarnation should use fadvise (since it doesn't [yet] use
>> mmap).
>>
>> I suggest we do one of these:
>> 1) rewrite rrd_create to not use filps (FILE*) but FD/mmap based I/O
>> Up until now i didn't implement this since in my POV rrd_create is
>> not really performance critical.
>> The advantage is that potentially this would use less memory than the
>> current implementation.
>> open the new file, with O_CREAT (see rrd_resize as an example how to
>> rrd_open() a new file in the new accessor impl).
>> This is IMHO the preferred way to go but a bit more work than the
>> alternative below.
>> 2) leave rrd_create alone and check for fadvise also for the mmap case.
>> Change all HAVE_POSIX_FADVISE in sources which provide new
>> accessor-methods to not be called. The net effect is that
>> a) not updated functions (create,graph come to mind) use fadvise
>> b) updated functions (update, resize, etc) do *not* call fadvise but
>> use only madvise.
>>
>> thoughts?
>> PS: while i think that rrd_create is not too performance critical,
>> rrd_graph certainly is since it is potentially called very often (for
>> obvious reasons, i.e. users :). So, while switching rrd_create to the
>> nwe accessor functions would be nice to have, updating rrd_graph is
>> overall more benefical, i assume.
>>
>> cheers,
>> Bernhard
>>
>>
>
>--
>Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
>http://it.oetiker.ch tobi at oetiker.ch ++41 62 213 9902
>#define _GNU_SOURCE 1
>#include <unistd.h>
>#include <fcntl.h>
>int main(int argc, char *argv[]) {
> int fd;
> fd = open(argv[1], O_RDONLY);
> posix_fadvise64(fd, 0,0,POSIX_FADV_DONTNEED);
> close(fd);
> return 0;
>}
>#include <stdio.h>
>#include <stdlib.h>
>#include <fcntl.h>
>#include <sys/types.h>
>#include <sys/stat.h>
>#include <unistd.h>
>#include <sys/mman.h>
>
>int main(int argc, char *argv[]) {
> int fd;
> struct stat file_stat;
> void *file_mmap;
> unsigned char *mincore_vec;
> size_t page_size = getpagesize();
> size_t page_index;
> fd = open(argv[1],0);
> fstat(fd, &file_stat);
> file_mmap = mmap((void *)0, file_stat.st_size, PROT_NONE, MAP_SHARED, fd, 0);
> mincore_vec = calloc(1, (file_stat.st_size+page_size-1)/page_size);
> mincore(file_mmap, file_stat.st_size, mincore_vec);
> printf("Cached Blocks of %s: ",argv[1]);
> for (page_index = 0; page_index <= file_stat.st_size/page_size; page_index++) {
> if (mincore_vec[page_index]&1) {
> printf("%lu ", (unsigned long)page_index);
> }
> }
> printf("\n");
> free(mincore_vec);
> munmap(file_mmap, file_stat.st_size);
> close(fd);
> return 0;
>}
>#! /usr/bin/perl
>#
># $Id:$
>#
># Created By Tobi Oetiker <tobi at oetiker.ch>
># Date 2006-10-27
>#
>#makes programm work AFTER install
>
>use lib qw( ../bindings/perl-shared/blib/lib ../bindings/perl-shared/blib/arch );
>
>print <<NOTE;
>
>RRDtool Performance Tester
>--------------------------
>Runnion on $RRDs::VERSION;
>
>RRDtool update performance is ultimately disk-bound. Since very little data
>does actually get written to disk in a single update, the performance
>is highly dependent on the cache situation in your machine.
>
>This test tries to cater for this. It works like this:
>
>1) Create RRD file tree
>
>2) Update RRD files several times in a row.
>
>NOTE
>
>use strict;
>use Time::HiRes qw(time);
>use RRDs;
>use IO::File;
>use Time::HiRes qw( usleep );
>
>sub create($$){
> my $file = shift;
> my $time = shift;
> my $start = time; #since we loaded HiRes
> RRDs::create ( $file.".rrd", "-b$time", qw(
> -s300
> DS:in:GAUGE:400:U:U
> DS:out:GAUGE:400:U:U
> RRA:AVERAGE:0.5:1:600
> RRA:AVERAGE:0.5:6:600
> RRA:MAX:0.5:6:600
> RRA:AVERAGE:0.5:24:600
> RRA:MAX:0.5:24:600
> RRA:AVERAGE:0.5:144:600
> RRA:MAX:0.5:144:600
> ));
> my $total = time - $start;
> my $error = RRDs::error;
> die $error if $error;
> return $total;
>}
>
>sub update($$){
> my $file = shift;
> my $time = shift;
> my $in = rand(1000);
> my $out = rand(1000);
> my $start = time;
> my $ret = RRDs::updatev($file.".rrd", $time.":$in:$out");
> my $total = time - $start;
> my $error = RRDs::error;
> die $error if $error;
> return $total;
>}
>
>sub tune($){
> my $file = shift;
> my $start = time;
> RRDs::tune ($file.".rrd", "-a","in:U","-a","out:U","-d","in:GAUGE","-d","out:GAUGE");
> my $total = time - $start;
> my $error = RRDs::error;
> die $error if $error;
> return $total;
>}
>
>sub infofetch($){
> my $file = shift;
> my $start = time;
> my $info = RRDs::info ($file.".rrd");
> my $error = RRDs::error;
> die $error if $error;
> my $lasttime = $info->{last_update} - $info->{last_update} % $info->{step};
> my $fetch = RRDs::fetch ($file.".rrd",'AVERAGE','-s',$lasttime-1,'-e',$lasttime);
> my $total = time - $start;
> my $error = RRDs::error;
> die $error if $error;
> return $total;
>}
>
>sub stddev ($$$){ #http://en.wikipedia.org/wiki/Standard_deviation
> my $sum = shift;
> my $squaresum = shift;
> my $count = shift;
> return sqrt( 1 / $count * ( $squaresum - $sum*$sum / $count ))
>}
>
>sub makerrds($$$$){
> my $count = shift;
> my $total = shift;
> my $list = shift;
> my $time = shift;
> my @files;
> my $now = int(time);
> for (1..$count){
> my $id = sprintf ("%07d",$total);
> $id =~ s/^(.)(.)(.)(.)(.)//;
> push @$list, "$1/$2/$3/$4/$5/$id";
> -d "$1" or mkdir "$1";
> -d "$1/$2" or mkdir "$1/$2";
> -d "$1/$2/$3" or mkdir "$1/$2/$3";
> -d "$1/$2/$3/$4" or mkdir "$1/$2/$3/$4";
> -d "$1/$2/$3/$4/$5" or mkdir "$1/$2/$3/$4/$5";
> push @files, $list->[$total];
> create $list->[$total++],$time-2;
> if ($now < int(time)){
> $now = int(time);
> print STDERR $count - $_," rrds to go. \r";
> }
> }
> return $count;
>}
>
>sub main (){
> mkdir "db-$$" or die $!;
> chdir "db-$$";
>
> my $step = 10000; # number of rrds to creat for every round
>
> my @path;
> my $time=int(time);
>
> my $tracksize = 0;
> my $uppntr = 0;
>
>
> my %squaresum = ( cr => 0, up => 0 );
> my %sum = ( cr => 0, up => 0 );
> my %count =( cr => 0, up => 0 );
>
> my $printtime = time;
> my %step;
> for (qw(1 6 24 144)){
> $step{$_} = int($time / 300 / $_);
> }
>
> for (0..2) {
> # enhance the track
> $time += 300;
> $tracksize += makerrds $step,$tracksize,\@path,$time;
> # run benchmark
>
> for (0..50){
> $time += 300;
> my $count = 0;
> my $sum = 0;
> my $squaresum = 0;
> my $prefix = "";
> for (qw(1 6 24 144)){
> if (int($time / 300 / $_) > $step{$_}) {
> $prefix .= "$_ ";
> $step{$_} = int($time / 300 / $_);
> }
> else {
> $prefix .= (" " x length("$_")) . " ";
> }
> }
> my $now = int(time);
> for (my $i = 0; $i<$tracksize;$i ++){
> my $ntime = int(time);
> if ($now < $ntime){
> printf STDERR "$prefix %7d \r",$i;
> $now = $ntime;
> }
> my $elapsed = update($path[$i],$time);
> $sum += $elapsed;
> $squaresum += $elapsed**2;
> $count++;
> };
> my $ups = $count/$sum;
> my $sdv = stddev($sum,$squaresum,$count);
> printf STDERR "$prefix %7d %6.0f Up/s (%6.5f sdv)\n",$count,$ups,$sdv;
> }
> print STDERR "\n";
> }
>}
>
>main;
More information about the rrd-developers
mailing list