[rrd-developers] UTF-8 and pango ... again

Tobias Oetiker tobi at oetiker.ch
Sat Aug 8 16:01:33 CEST 2009


Today Sebastian Harl wrote:

> Hi Tobi,
>
> On Sat, Aug 08, 2009 at 02:11:37PM +0200, Tobias Oetiker wrote:
> > Today Sebastian Harl wrote:
> > > On Sat, Aug 08, 2009 at 11:33:56AM +0200, Tobias Oetiker wrote:
> > > > Jul 21 Sebastian Harl wrote:
> > > > > The following problem has been reported to me:
> > > > >
> > > > > When creating graphs covering one year and using something like
> > > > > LANG=en_US.UTF-8 and LC_TIME="de_DE" in the environment, pango reports
> > > > > the following warning:
> > > > >
> > > > > Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()
> > > >
> > > > what happens when you set
> > > >
> > > > LC_TIME=de_DE.UTF-8
> > >
> > > Well, then it works fine - strftime() then generates an UTF-8 encoded
> > > string which will work no matter which locale has been set.
> >
> > in that case one solution might be:
> >
> >  * if LC_TIME is set before calling the formating for the x-axis legend
> >    - save LANG
> >    - set LANG=LC_TIME
> >
> > what do you think ?
>
> That won't reliably work either. E.g. if both, LANG and LC_TIME, use
> different non-UTF-8 locales then strings passed on the command line
> (axis description, labels, etc.) might not be handled correctly (by
> pango). We'd then have to reset LANG right after formatting the x-axis
> tick labels and before handling any other text ?

exactly ...  see my idea above ...

>
> This sounds like a rather bad hack to me :-/ This way we'd solve this
> one specific problem but the main problem (we need some reliable way to
> make sure _all_ strings will be UTF-8-encoded before being passed to
> pango) does not get solved and we might stumble across it again and
> again. I'd much prefer to have this solved in a more generic way, if
> possible. Maybe we should talk to the pango-people about this?

I don't think there is anything to be solved here (in general) the
canonical way this is being dealt with on unix AFAIK is that
applications assume that text files are encoded in whatever the
current setting of LANG tells them ...

some applications apply heuristics to determine the characterset of
files but I don't think that is a good solution for a
noninteractive application like rrdtool ...

maybe some hints in the rrdtool graph documentation would be all
that is needed. After all, the problem you ran into is easily fixed
by setting the LC_TIME variable properly ...

cheers
tobi


-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900



More information about the rrd-developers mailing list