[rrd-developers] implementing portable format

Thu Oct 30 19:03:06 CET 2008

I have some more ideas on the implementation...  I tried to list the
categories in increasing order of difficulty.  I'm sure I'm missing a few
gotchas, but these strike me as the major categories that need work.

Looking for feedback on these...  Let me know if I'm off-base.

-----------------------------------------------------------------

CHOICE OF ON-DISK ENCODING:

* estimate user base, choose most common architecture for native format
     - probably i386?

* choose a specific byte-string for RRD portable NAN, INF.  Conversion
  routines will have to test specifically for this and convert between
  "RRD NAN" and "native NAN".

-----------------------------------------------------------------

ALIGNMENT:

* Use multiples of 64-bits for all header values.  The wasted space won't
  amount to much, and it will work on platforms that align to either 32-
  or 64-bits.

* To avoid changing the front of stat_head, we can start with this...

   struct stat_head {
     char cookie[4];
     char version[12];  // was version[5]
     ...
   }

  Then, subsequent values can start at a 64-bit-aligned value.
  strcmp(version) will work either way - the new files will just have more
  '\0' at the end.

-----------------------------------------------------------------

SUPPORT FOR OLD VERSIONS:

* Create a new stat_head.version = "0005" for the new portable RRD

* How fully should we support older RRD files?  Should we handle full
  read/write on V<0005 files?  Doing so complicates the code path, and may
  introduce new bugs.  Compelling users to upgrade is not pleasant, but
  perhaps acceptable across major rev?

-----------------------------------------------------------------

PORTABLE VS. NATIVE FORMAT:

* Since we pass rrd_t around so many places, it's better if we have to
  handle only a single type of struct in code.  When we rrd_open() an
  older file, create a V0005 struct.  Keep the previous stat_head.version
  so we can tell how to handle the file (i.e. whether we have to convert
  values to/from native).

* If we keep the in-memory rrd_t.* in portable format, we have to convert
  it in many places; some conversions are likely to get missed.  Instead,
  we should convert it to native format in rrd_open.  Other code remains
  largely unchanged.

  * the IN-FILE rrd_t header will be in portable format
  * the IN-MEMORY rrd_t header will be in machine-native format
  * therefore, we can't use the mmap()'ed version directly; we'll have
    to copy+convert it
  * in the reverse direction, we'll have to convert it back to portable
    format and memcpy() on top of the mmap version.

-----------------------------------------------------------------

FORMAT CONVERSION:

* as shown on the Wiki, we can determine the native encoding at build time
  with a "union".

  - we can check the byte values for a value whose encoding is well-known
  - i.e. "5.44760372201160503468005645008891e-270" for doubles
  - This encodes to bytes[8] = {1,2,3,4,5,6,7,8} on i386.
  - This encodes to bytes[8] = {8,7,6,5,4,3,2,1} on SPARC.

  ==> We can use this to generate a .c/.h file that contains the
      native-to-portable and portable-to-native conversion macros.

  ==> We can determine byte encoding at build time.  We won't need a
      catalog of architectures and associated conversion macros.

* I'm thinking something analogous to the nthos()/htons() functions that
  can be used to convert each data type from native-format to
  portable-format.  i.e. htorrd_d(double), htorrd_i(int64_t) or something
  similar.

* create utility functions to simplify...  optionally they can determine
  whether conversion is necessary based on stat_head.version.
     - read and convert-to-native (double/int64_t)
     - convert-to-portable and write (double/int64_t)

-----------------------------------------------------------------

Am I missing anything?

-- 
 kevin brintnall =~ /kbrint at rufus.net/