[rrd-developers] implementing portable format

Thu Oct 30 23:53:07 CET 2008

Hi Kevin,

Today kevin brintnall wrote:

>
> CHOICE OF ON-DISK ENCODING:
>
> * estimate user base, choose most common architecture for native format
>      - probably i386?

I would use the amd64/linux format since this will cover the 64bit
alignement issue you mention below as well.

> * choose a specific byte-string for RRD portable NAN, INF.  Conversion
>   routines will have to test specifically for this and convert between
>   "RRD NAN" and "native NAN".

I would just pick the one that goes with amd64

> ALIGNMENT:
>
> * Use multiples of 64-bits for all header values.  The wasted space won't
>   amount to much, and it will work on platforms that align to either 32-
>   or 64-bits.
>
> * To avoid changing the front of stat_head, we can start with this...
>
>    struct stat_head {
>      char cookie[4];
>      char version[12];  // was version[5]
>      ...
>    }
>
>   Then, subsequent values can start at a 64-bit-aligned value.
>   strcmp(version) will work either way - the new files will just have more
>   '\0' at the end.

neat

> -----------------------------------------------------------------
>
> SUPPORT FOR OLD VERSIONS:
>
> * Create a new stat_head.version = "0005" for the new portable RRD
>
> * How fully should we support older RRD files?  Should we handle full
>   read/write on V<0005 files?  Doing so complicates the code path, and may
>   introduce new bugs.  Compelling users to upgrade is not pleasant, but
>   perhaps acceptable across major rev?

I would like to see full support (read/write. not necessarily
create). the motivation is as follows.

* thinking about the work dan is doing I think it would be an
  overall good thing to find a sensible design for rrd_open which
  de-coules internal and on-disk representation of rrd data to a
  large extent.

* adoption will be much quicker if we do not force people to re-do
  all their rrd files and maybe even destroy them in the process.

> -----------------------------------------------------------------
>
> PORTABLE VS. NATIVE FORMAT:
>
> * Since we pass rrd_t around so many places, it's better if we have to
>   handle only a single type of struct in code.  When we rrd_open() an
>   older file, create a V0005 struct.  Keep the previous stat_head.version
>   so we can tell how to handle the file (i.e. whether we have to convert
>   values to/from native).

agreed

> * If we keep the in-memory rrd_t.* in portable format, we have to convert
>   it in many places; some conversions are likely to get missed.  Instead,
>   we should convert it to native format in rrd_open.  Other code remains
>   largely unchanged.

>   * the IN-FILE rrd_t header will be in portable format
>   * the IN-MEMORY rrd_t header will be in machine-native format
>   * therefore, we can't use the mmap()'ed version directly; we'll have
>     to copy+convert it
>   * in the reverse direction, we'll have to convert it back to portable
>     format and memcpy() on top of the mmap version.

agreed, this will not be a problem though I think. it will even
help to make code more stable. at the moment, it is rather strange
to get a variable back from a read operation which is a two way
connection to the data on disk. this was even the cause of a bug in
rrd_resize.

> -----------------------------------------------------------------
>
> FORMAT CONVERSION:
>
> * as shown on the Wiki, we can determine the native encoding at build time
>   with a "union".
>
>   - we can check the byte values for a value whose encoding is well-known
>   - i.e. "5.44760372201160503468005645008891e-270" for doubles
>   - This encodes to bytes[8] = {1,2,3,4,5,6,7,8} on i386.
>   - This encodes to bytes[8] = {8,7,6,5,4,3,2,1} on SPARC.
>
>   ==> We can use this to generate a .c/.h file that contains the
>       native-to-portable and portable-to-native conversion macros.

yep

>   ==> We can determine byte encoding at build time.  We won't need a
>       catalog of architectures and associated conversion macros.

this would be cool ... although, I think we have to have a solid
set of test to make sure an automatically generated convertor
actually does what it is suposed todo. after all it would be a big
desaster if rrdtool on a certain platform just generated
non-portable-portable files ...

> * I'm thinking something analogous to the nthos()/htons() functions that
>   can be used to convert each data type from native-format to
>   portable-format.  i.e. htorrd_d(double), htorrd_i(int64_t) or something
>   similar.

this would then be used in rrd_open ...

> * create utility functions to simplify...  optionally they can determine
>   whether conversion is necessary based on stat_head.version.
>      - read and convert-to-native (double/int64_t)
>      - convert-to-portable and write (double/int64_t)

this could be macros too

when thinking about the new format, it might also be worth spending
some thought on shortcomings of the present format. I think it
might be possible to add the following features quite easliy

* unlimited string length for labels
* ability to store 'user data' in the rrd header and along with each datasource

With unlimited I mean that it is parametrized and gets set at rrd
creation time.

Also the sizes on scratch space in PDP and CPD areas could be
paramtetrized which would help with future enhancements of the data
format.

cheers
tobi

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900