[rrd-developers] UTF-8 and pango ... again

Sebastian Harl sh at tokkee.org
Mon Aug 10 11:53:55 CEST 2009


Hi Tobi,

On Sat, Aug 08, 2009 at 04:01:33PM +0200, Tobias Oetiker wrote:
> Today Sebastian Harl wrote:
> > On Sat, Aug 08, 2009 at 02:11:37PM +0200, Tobias Oetiker wrote:
> > > Today Sebastian Harl wrote:
> > > > On Sat, Aug 08, 2009 at 11:33:56AM +0200, Tobias Oetiker wrote:
> > > > > Jul 21 Sebastian Harl wrote:
> > > > > > The following problem has been reported to me:
> > > > > >
> > > > > > When creating graphs covering one year and using something like
> > > > > > LANG=en_US.UTF-8 and LC_TIME="de_DE" in the environment, pango reports
> > > > > > the following warning:
> > > > > >
> > > > > > Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()
> > > > >
> > > > > what happens when you set
> > > > >
> > > > > LC_TIME=de_DE.UTF-8
> > > >
> > > > Well, then it works fine - strftime() then generates an UTF-8 encoded
> > > > string which will work no matter which locale has been set.
> > >
> > > in that case one solution might be:
> > >
> > >  * if LC_TIME is set before calling the formating for the x-axis legend
> > >    - save LANG
> > >    - set LANG=LC_TIME
> > >
> > > what do you think ?
> >
> > That won't reliably work either. E.g. if both, LANG and LC_TIME, use
> > different non-UTF-8 locales then strings passed on the command line
> > (axis description, labels, etc.) might not be handled correctly (by
> > pango). We'd then have to reset LANG right after formatting the x-axis
> > tick labels and before handling any other text ?
> 
> exactly ...  see my idea above ...

Yeah - I wasn't sure if you had that in mind, since you did not mention
resetting LANG, so to make sure we won't miss that, I mentioned it
anyway. What other environment variables might be relevant? LC_ALL comes
to my mind ...

> > This sounds like a rather bad hack to me :-/ This way we'd solve this
> > one specific problem but the main problem (we need some reliable way to
> > make sure _all_ strings will be UTF-8-encoded before being passed to
> > pango) does not get solved and we might stumble across it again and
> > again. I'd much prefer to have this solved in a more generic way, if
> > possible. Maybe we should talk to the pango-people about this?
> 
> I don't think there is anything to be solved here (in general) the
> canonical way this is being dealt with on unix AFAIK is that
> applications assume that text files are encoded in whatever the
> current setting of LANG tells them ...

Hrm ... thinking about that again, I guess, you're right.

> some applications apply heuristics to determine the characterset of
> files but I don't think that is a good solution for a
> noninteractive application like rrdtool ...

Ack. Also, most strings are passed on the command line rather than being
read from some file. Applying heuristics to (rather short) strings is
probably much more error prone than determining the encoding of some
file and, I assume, trying to handle that correctly is rather a pain in
the behind.

> maybe some hints in the rrdtool graph documentation would be all
> that is needed. After all, the problem you ran into is easily fixed
> by setting the LC_TIME variable properly ...

I agree that documenting the need to have a properly set up environment
would be a good thing (e.g. if rrdtool is called from a [shell-]script,
that script has to be encoded according to the locale settings when
running the script). However, I think it's valid to use different
settings for LC_TIME than for e.g. LANG, so imho that should be handled
in RRDtool.

Cheers,
Sebastian

-- 
Sebastian "tokkee" Harl +++ GnuPG-ID: 0x8501C7FC +++ http://tokkee.org/

Those who would give up Essential Liberty to purchase a little Temporary
Safety, deserve neither Liberty nor Safety.         -- Benjamin Franklin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://lists.oetiker.ch/pipermail/rrd-developers/attachments/20090810/cf3c87ca/attachment.bin 


More information about the rrd-developers mailing list