Loading...
 

Tikiwiki-devel (mailman list mirror)


Database encoding, what is the right one.

posts: 2938 United Kingdom


My thinking was that general is fine for "English" only sites but multilingual should use unicode collation, is this not the case?

Interesting about the mb4 part Brendan, i'll try that next time for the emoticons and Greek maths symbols think - there's a wish open on dev about this isn't there? We should add a workaround or comment to that if this works.

Thanks,

jonny


> On 19 May 2017, at 08:05, Brendan Ferguson <drsassafras@gmail.com> wrote:
>
> Yes. We need to move in that direction
>
> utf8mb4_unicode_ci is the right direction. It also fixes a few bugs (such as tiki dying when emoticons are used on wiki pages)
>
> There are a couple factors to consider here: 1 It will change our index sizes. 2. It will change the amount of space required to store data in char (but not varchar)
>
> Right now we use 3-bitesting character encoding, when we go to 4, the char (and indexes which are never variable bitesized) will grow. There are currently a few indexes (probably not so well configured) that will go over size when upgrading. These need to be addressed.
>
> It will also bump the minimum mysql version we support. Its not such a big deal, cause its kinda old now, but we will need to program in check for it.
>
> Ideally, we would implement a multi-characer encoding setup, where ASCII is used for anything that only requires 1-byte char encoding, utf8 is used for anything that needs multilingual and uft8mb4 is used where emoticons might be used. There is a slight performance decrease each time a bite-x used….. I wonder if its worth the headache though?
>
> Going from _general_ to _unicode_ also seems like its a reasonable step. It does slow down a bit, benchmarks look like anywhere between 5-10% slower sorting. I dont think it makes ANY difference in english, but in many other languages, sorting makes much more sense. Perhaps this could be an option?
>
> my 2 cents.
>
> Brendan
>
>
>
>> On May 19, 2017, at 2:36 AM, Bernard Sfez <me@bsfez.com> wrote:
>>
>> Hello,
>>
>> When I install a database or setup a server I always set to utf8_general mainly to use Tiki.
>>
>> I read this article and wonder if it is not time to change this : http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci
>>
>> Thoughts ?
>>
>> Bernard Sfez | bsfez.com
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot_
>> TikiWiki-devel mailing list
>> TikiWiki-devel at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot_
> TikiWiki-devel mailing list
> TikiWiki-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
TikiWiki-devel mailing list
TikiWiki-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel

Why Register?

Register at tiki.org and you'll be able to use it at any *.tiki.org site, thanks to the InterTiki feature. A valid email address is required to receive site notifications and occasional newsletters. You can opt out of these items at any time.