Thursday, 7 November 2013

UniString

Death of UniString

It's always been a source of frustration for me that LibreOffice and its predecessors had two different String class families. A "new" set in sal and an "old" set in tools, where "new" is > 13 years old. Each set had a string for 8 bit characters and one for 16 bit UTF-16. The old classes are limited to 64k characters while the new ones use a 32bit length.

So, one of the oldest easy hacks we had on launching LibreOffice was
Removal/Replacement of the String/UniString with OUString once and for all. We managed quickly enough to remove the old 8 bit "ByteString" class, but the UTF-16 UniString class lingered on.


Now finally, after being painstakingly chipped away one method at a time and incrementally brush-clearing one file, one dir, one module of the enemy string, UniString is gone. I think this commit is the one that removes the last stray UniString usage from LibreOffice.

While a load of people worked on this, Noel Grandin put in an awesome effort to convert  a staggering amount of code over to finish this.

Now we just need to
a) update our wiki pages to root out all mentions of UniString
b) audit and remove the uses of the 64k STRING_MAXLEN limit define and remove that length limitation in places like max paragraph size allowed to be imported from .doc and .html files