Loading...
 

Tikiwiki-devel (mailman list mirror)


Database encoding, what is the right one.

posts: 627 Canada

>
> Out of curiosity, what happens when storing a 4-char character in a utf8 database table currently? Is it stripped out? Does the INSERT fail? Is it corrupted? Is it mangled into some &#xxxxx; syntax?
> If it's the last case ( mangled into some &#xxxxx; syntax ), we still have extra converting work. Otherwise we are very fine.

Right now the page fails with a white page. Its not handled at all and kills the database insert. Basically it just fails and looks like tiki is dead. So its kinda going to be easy to upgrade cause we don't have to meddle with the error handling… there is none.

>
> Cheers,
> Jyhem
>
> On Fri, May 19, 2017 at 3:37 PM, Brendan Ferguson <drsassafras@gmail.com <mailto:drsassafras@gmail.com>> wrote:
>>
>> I wonder if there is an encoding change when converting from utf8 to utf8mb4 like there was from latin1 to utf8, or if it's all about storing utf8 chars in the same way and ignoring the 4-chars ones in utf8? We'd need to verify with 4-char languages or find the answer in a doc…
>
> No encoding change, its exactly the same (aside from the capacity for 4-char characters) but the 3-chars and down are all the same unicode :-) so that ought to make it a little simpler.
>
>>
>> Also, is there a concern about encoding compatibility with elastic search? They say there is a small performance hit with "unicode" sorting, but our performance-critical servers already offload a lot of sorting to elastic search, don't they?
>>
>> Cheers,
>> J-M
>>
>> On Fri, May 19, 2017 at 9:05 AM, Brendan Ferguson <drsassafras@gmail.com <mailto:drsassafras@gmail.com>> wrote:
>> Yes. We need to move in that direction
>>
>> utf8mb4_unicode_ci is the right direction. It also fixes a few bugs (such as tiki dying when emoticons are used on wiki pages)
>>
>> There are a couple factors to consider here: 1 It will change our index sizes. 2. It will change the amount of space required to store data in char (but not varchar)
>>
>> Right now we use 3-bitesting character encoding, when we go to 4, the char (and indexes which are never variable bitesized) will grow. There are currently a few indexes (probably not so well configured) that will go over size when upgrading. These need to be addressed.
>>
>> It will also bump the minimum mysql version we support. Its not such a big deal, cause its kinda old now, but we will need to program in check for it.
>>
>> Ideally, we would implement a multi-characer encoding setup, where ASCII is used for anything that only requires 1-byte char encoding, utf8 is used for anything that needs multilingual and uft8mb4 is used where emoticons might be used. There is a slight performance decrease each time a bite-x used….. I wonder if its worth the headache though?
>>
>> Going from _general_ to _unicode_ also seems like its a reasonable step. It does slow down a bit, benchmarks look like anywhere between 5-10% slower sorting. I dont think it makes ANY difference in english, but in many other languages, sorting makes much more sense. Perhaps this could be an option?
>>
>> my 2 cents.
>>
>> Brendan
>>
>>
>>
>>> On May 19, 2017, at 2:36 AM, Bernard Sfez <me@bsfez.com <mailto:me@bsfez.com>> wrote:
>>>
>>> Hello,
>>>
>>> When I install a database or setup a server I always set to utf8_general mainly to use Tiki.
>>>
>>> I read this article and wonder if it is not time to change this : http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci <http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci>
>>>
>>> Thoughts ?
>>>
>>> Bernard Sfez | bsfez.com <https://bsfez.com/>
>>> ------------------------------------------------------------------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sdm.link/slashdot_ <http://sdm.link/slashdot___>
>>> TikiWiki-devel mailing list
>>> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sdm.link/slashdot <http://sdm.link/slashdot>
>> ___
>> TikiWiki-devel mailing list
>> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sdm.link/slashdot_ <http://sdm.link/slashdot___>
>> TikiWiki-devel mailing list
>> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot <http://sdm.link/slashdot>
> ___
> TikiWiki-devel mailing list
> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot_
> TikiWiki-devel mailing list
> TikiWiki-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel

Why Register?

Register at tiki.org and you'll be able to use it at any *.tiki.org site, thanks to the InterTiki feature. A valid email address is required to receive site notifications and occasional newsletters. You can opt out of these items at any time.