Loading...
 

Tikiwiki-devel (mailman list mirror)


Use utf8mb4 as default character-set for DB

posts: 3183 United Kingdom


Well, thinking about it what could be the danger for new installs? Presumably utf8mb4 works as well as old UTF-8 in normal (2 or 3 byte) circumstances, no?

The only danger would be adding 4 byte chars to Tikis when we’re not sure which encoding they’re using, because for non-utf8mb4 ones we get a fatal error.

So switching to this

> On 13 Nov 2017, at 20:32, Cloutier, Philippe (DGARI-Consultant) <Philippe.Cloutier.externe@mern-mffp.gouv.qc.ca> wrote:

>
> Hi Alexander,
>
>> -----Message d’origine-----
>> De : Alexander Mette mailto:mail@amette.eu
>> Envoyé : 13 novembre 2017 15:08
>> À : tikiwiki-devel at lists.sourceforge.net
>> Objet : Re: Tiki-devel Use utf8mb4 as default character-set for DB
>>
>> Yes, even though this looks rather trivial, it still is encoding
>> related. I too fear that it is too late for 18. Would have been a good
>> change for an LTS though. :-/
>>
>> Would doing it only for fresh installs be an option!? I don’t know if
>> there’s a rule or anything, but I think that the DB of a certain
>> Tiki-version should always look the same no matter if it was a fresh
>> install or an upgrade, no?
>
>
> In some respects, but not on that specific point. The conversion from ISO-8859 to UTF-8 was offered “optionally” and I think installs which never converted are still supported, many years later.
>
>>
>> regrds
>> amette
>>
>> On Tue, 7 Nov 2017 12:54:14 +0100
>> luciash <luci@tiki.org> wrote:
>>
>>> Yes, I remember that! :-p That is why I support only the idea to do
>>> it just for fresh installs, not for upgrades please!
>>>
>>> luci
>>>
>>>
>>> On 7.11.2017 12:50, Jonny Bradley wrote:
>>>> However, i suggest we leave this until after Tiki 18 - i think LTS
>>>> is not a good time to experiment with character encoding in any
>>>> way, some of us remember the great charset encoding disaster of
>>>> Tiki 5!:)
>>>>
>>>> jb
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Check out the vibrant tech community on one of the world’s most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> TikiWiki-devel mailing list
>>> TikiWiki-devel at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
>>
>>
>>
>> --
>> https://amette.eu
>> mail at amette.eu
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world’s most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> TikiWiki-devel mailing list
>> TikiWiki-devel at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world’s most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> TikiWiki-devel mailing list
> TikiWiki-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world’s most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
TikiWiki-devel mailing list
TikiWiki-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel

posts: 3183 United Kingdom


Oops, wrong button! :-)

I was just going to save this as draft and deal with other stuff... so while i’m here, i was going to say: If/when we do (optionally) adopt and support 4 byte emoticons and any other 4 byte characters, people upgrading from 18 to 19 (say) won’t have to convert their databases, so maybe that’s a good thing?

Then i remembered Brendan’s point about some languages then talking up even more space making the problem of, for instance, Japanese page names not fitting into the current field length...

Hmm - i think i’m still currently -1 on this for 18, it’s too close now and i’m scared of text encoding! :-)

jonny



> On 14 Nov 2017, at 10:12, Jonny Bradley <jonny@tiki.org> wrote:
>
>
> Well, thinking about it what could be the danger for new installs? Presumably utf8mb4 works as well as old UTF-8 in normal (2 or 3 byte) circumstances, no?
>
> The only danger would be adding 4 byte chars to Tikis when we’re not sure which encoding they’re using, because for non-utf8mb4 ones we get a fatal error.
>
> So switching to this
>

>> On 13 Nov 2017, at 20:32, Cloutier, Philippe (DGARI-Consultant) <Philippe.Cloutier.externe@mern-mffp.gouv.qc.ca> wrote:

>>
>> Hi Alexander,
>>
>>> -----Message d’origine-----
>>> De : Alexander Mette mailto:mail@amette.eu
>>> Envoyé : 13 novembre 2017 15:08
>>> À : tikiwiki-devel at lists.sourceforge.net
>>> Objet : Re: Tiki-devel Use utf8mb4 as default character-set for DB
>>>
>>> Yes, even though this looks rather trivial, it still is encoding
>>> related. I too fear that it is too late for 18. Would have been a good
>>> change for an LTS though. :-/
>>>
>>> Would doing it only for fresh installs be an option!? I don’t know if
>>> there’s a rule or anything, but I think that the DB of a certain
>>> Tiki-version should always look the same no matter if it was a fresh
>>> install or an upgrade, no?
>>
>>
>> In some respects, but not on that specific point. The conversion from ISO-8859 to UTF-8 was offered “optionally” and I think installs which never converted are still supported, many years later.
>>
>>>
>>> regrds
>>> amette
>>>
>>> On Tue, 7 Nov 2017 12:54:14 +0100
>>> luciash <luci@tiki.org> wrote:
>>>
>>>> Yes, I remember that! :-p That is why I support only the idea to do
>>>> it just for fresh installs, not for upgrades please!
>>>>
>>>> luci
>>>>
>>>>
>>>> On 7.11.2017 12:50, Jonny Bradley wrote:
>>>>> However, i suggest we leave this until after Tiki 18 - i think LTS
>>>>> is not a good time to experiment with character encoding in any
>>>>> way, some of us remember the great charset encoding disaster of
>>>>> Tiki 5!:)
>>>>>
>>>>> jb
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Check out the vibrant tech community on one of the world’s most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> ___
>>>> TikiWiki-devel mailing list
>>>> TikiWiki-devel at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
>>>
>>>
>>>
>>> --
>>> https://amette.eu
>>> mail at amette.eu
>>>
>>> ------------------------------------------------------------------------------
>>> Check out the vibrant tech community on one of the world’s most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> TikiWiki-devel mailing list
>>> TikiWiki-devel at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world’s most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> TikiWiki-devel mailing list
>> TikiWiki-devel at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world’s most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
TikiWiki-devel mailing list
TikiWiki-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel

posts: 757 Canada

>
> Then i remembered Brendan’s point about some languages then talking up even more space making the problem of, for instance, Japanese page names not fitting into the current field length...

Right, but ascii characters only take up 1-byte. It’s only the emoji-etc stored in 4-bytes.... Except for indexes. We won’t truncate fields by converting, but it will mess with indexes, and some truncating will presumably happen there. Right now every index takes 3-bytes to store any character. After the change in very index will need 4-bytes to store any character.

We could increase the size of each index by an additional 1/3, so the behavior won’t change, but many of our indexes are poorly chosen. Well chosen indexes should be increased. Absurdly long indexes might cause errors by exceeding character limits, so they all need to be checked as well.

So ya, I would say 19 as well. There is no encoding conversion. By that I mean every character in UTF8 is stored with the exact same way as MB8. Its likely going to be a much smoother transition, but it’s going to effect a lot of data in diverse ways! A small unseen issue could cause massive disruption over large amounts of tiki.... But here’s hoping it all goes smoothly. I’m hoping that any issues caused will be less troublesome than the fatal error caused by emoji that we currently have. But still.

Brendan

>
> Hmm - i think i’m still currently -1 on this for 18, it’s too close now and i’m scared of text encoding! :-)
>
> jonny
>
>
>
>> On 14 Nov 2017, at 10:12, Jonny Bradley <jonny@tiki.org> wrote:
>>
>>
>> Well, thinking about it what could be the danger for new installs? Presumably utf8mb4 works as well as old UTF-8 in normal (2 or 3 byte) circumstances, no?
>>
>> The only danger would be adding 4 byte chars to Tikis when we’re not sure which encoding they’re using, because for non-utf8mb4 ones we get a fatal error.
>>
>> So switching to this
>>

>>> On 13 Nov 2017, at 20:32, Cloutier, Philippe (DGARI-Consultant) <Philippe.Cloutier.externe@mern-mffp.gouv.qc.ca> wrote:

>>>
>>> Hi Alexander,
>>>
>>>> -----Message d’origine-----
>>>> De : Alexander Mette mailto:mail@amette.eu
>>>> Envoyé : 13 novembre 2017 15:08
>>>> À : tikiwiki-devel at lists.sourceforge.net
>>>> Objet : Re: Tiki-devel Use utf8mb4 as default character-set for DB
>>>>
>>>> Yes, even though this looks rather trivial, it still is encoding
>>>> related. I too fear that it is too late for 18. Would have been a good
>>>> change for an LTS though. :-/
>>>>
>>>> Would doing it only for fresh installs be an option!? I don’t know if
>>>> there’s a rule or anything, but I think that the DB of a certain
>>>> Tiki-version should always look the same no matter if it was a fresh
>>>> install or an upgrade, no?
>>>
>>>
>>> In some respects, but not on that specific point. The conversion from ISO-8859 to UTF-8 was offered “optionally” and I think installs which never converted are still supported, many years later.
>>>
>>>>
>>>> regrds
>>>> amette
>>>>
>>>> On Tue, 7 Nov 2017 12:54:14 +0100
>>>> luciash <luci@tiki.org> wrote:
>>>>
>>>>> Yes, I remember that! :-p That is why I support only the idea to do
>>>>> it just for fresh installs, not for upgrades please!
>>>>>
>>>>> luci
>>>>>
>>>>>
>>>>>> On 7.11.2017 12:50, Jonny Bradley wrote:
>>>>>> However, i suggest we leave this until after Tiki 18 - i think LTS
>>>>>> is not a good time to experiment with character encoding in any
>>>>>> way, some of us remember the great charset encoding disaster of
>>>>>> Tiki 5!:)
>>>>>>
>>>>>> jb
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Check out the vibrant tech community on one of the world’s most
>>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>>> ___
>>>>> TikiWiki-devel mailing list
>>>>> TikiWiki-devel at lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
>>>>
>>>>
>>>>
>>>> --
>>>> https://amette.eu
>>>> mail at amette.eu
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Check out the vibrant tech community on one of the world’s most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> ___
>>>> TikiWiki-devel mailing list
>>>> TikiWiki-devel at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
>>>
>>> ------------------------------------------------------------------------------
>>> Check out the vibrant tech community on one of the world’s most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> TikiWiki-devel mailing list
>>> TikiWiki-devel at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world’s most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> TikiWiki-devel mailing list
> TikiWiki-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world’s most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
TikiWiki-devel mailing list
TikiWiki-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel

I also vote for 19.

Cheers,
Jyhem

On Tue, Nov 14, 2017 at 3:06 PM, Brendan Ferguson <drsassafras@gmail.com>
wrote:

> >
> > Then i remembered Brendan’s point about some languages then talking up
> even more space making the problem of, for instance, Japanese page names
> not fitting into the current field length...
>
> Right, but ascii characters only take up 1-byte. It’s only the emoji-etc
> stored in 4-bytes.... Except for indexes. We won’t truncate fields by
> converting, but it will mess with indexes, and some truncating will
> presumably happen there. Right now every index takes 3-bytes to store any
> character. After the change in very index will need 4-bytes to store any
> character.
>
> We could increase the size of each index by an additional 1/3, so the
> behavior won’t change, but many of our indexes are poorly chosen. Well
> chosen indexes should be increased. Absurdly long indexes might cause
> errors by exceeding character limits, so they all need to be checked as
> well.
>
> So ya, I would say 19 as well. There is no encoding conversion. By that I
> mean every character in UTF8 is stored with the exact same way as MB8. Its
> likely going to be a much smoother transition, but it’s going to effect a
> lot of data in diverse ways! A small unseen issue could cause massive
> disruption over large amounts of tiki.... But here’s hoping it all goes
> smoothly. I’m hoping that any issues caused will be less troublesome than
> the fatal error caused by emoji that we currently have. But still.
>
> Brendan
>
> >
> > Hmm - i think i’m still currently -1 on this for 18, it’s too close now
> and i’m scared of text encoding! :-)
> >
> > jonny
> >
> >
> >
> >> On 14 Nov 2017, at 10:12, Jonny Bradley <jonny@tiki.org> wrote:
> >>
> >>
> >> Well, thinking about it what could be the danger for new installs?
> Presumably utf8mb4 works as well as old UTF-8 in normal (2 or 3 byte)
> circumstances, no?
> >>
> >> The only danger would be adding 4 byte chars to Tikis when we’re not
> sure which encoding they’re using, because for non-utf8mb4 ones we get a
> fatal error.
> >>
> >> So switching to this
> >>
> >>> On 13 Nov 2017, at 20:32, Cloutier, Philippe (DGARI-Consultant) <
> Philippe.Cloutier.externe at mern-mffp.gouv.qc.ca> wrote:
> >>>
> >>> Hi Alexander,
> >>>
> >>>> -----Message d’origine-----
> >>>> De : Alexander Mette mailto:mail@amette.eu
> >>>> Envoyé : 13 novembre 2017 15:08
> >>>> À : tikiwiki-devel at lists.sourceforge.net
> >>>> Objet : Re: Tiki-devel Use utf8mb4 as default character-set for DB
> >>>>
> >>>> Yes, even though this looks rather trivial, it still is encoding
> >>>> related. I too fear that it is too late for 18. Would have been a good
> >>>> change for an LTS though. :-/
> >>>>
> >>>> Would doing it only for fresh installs be an option!? I don’t know if
> >>>> there’s a rule or anything, but I think that the DB of a certain
> >>>> Tiki-version should always look the same no matter if it was a fresh
> >>>> install or an upgrade, no?
> >>>
> >>>
> >>> In some respects, but not on that specific point. The conversion from
> ISO-8859 to UTF-8 was offered “optionally” and I think installs which never
> converted are still supported, many years later.
> >>>
> >>>>
> >>>> regrds
> >>>> amette
> >>>>
> >>>> On Tue, 7 Nov 2017 12:54:14 +0100
> >>>> luciash <luci@tiki.org> wrote:
> >>>>
> >>>>> Yes, I remember that! :-p That is why I support only the idea to do
> >>>>> it just for fresh installs, not for upgrades please!
> >>>>>
> >>>>> luci
> >>>>>
> >>>>>
> >>>>>> On 7.11.2017 12:50, Jonny Bradley wrote:
> >>>>>> However, i suggest we leave this until after Tiki 18 - i think LTS
> >>>>>> is not a good time to experiment with character encoding in any
> >>>>>> way, some of us remember the great charset encoding disaster of
> >>>>>> Tiki 5!:)
> >>>>>>
> >>>>>> jb
> >>>>>
> >>>>>
> >>>>> ------------------------------------------------------------
> ------------------
> >>>>> Check out the vibrant tech community on one of the world’s most
> >>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >>>>> ___
> >>>>> TikiWiki-devel mailing list
> >>>>> TikiWiki-devel at lists.sourceforge.net
> >>>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> https://amette.eu
> >>>> mail at amette.eu
> >>>>
> >>>> ------------------------------------------------------------
> ------------------
> >>>> Check out the vibrant tech community on one of the world’s most
> >>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >>>> ___
> >>>> TikiWiki-devel mailing list
> >>>> TikiWiki-devel at lists.sourceforge.net
> >>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
> >>>
> >>> ------------------------------------------------------------
> ------------------
> >>> Check out the vibrant tech community on one of the world’s most
> >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >>> ___
> >>> TikiWiki-devel mailing list
> >>> TikiWiki-devel at lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
> >
> >
> > ------------------------------------------------------------
> ------------------
> > Check out the vibrant tech community on one of the world’s most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > TikiWiki-devel mailing list
> > TikiWiki-devel at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world’s most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> TikiWiki-devel mailing list
> TikiWiki-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel

posts: 6976 Israel

Hi,

I’m really flying over this discussion but just a note that Hebrew is 4bytes (last time I checked :-))


> On 15 Nov 2017, at 04:35 , Jean-Marc Libs <jeanmarc.libs@gmail.com> wrote:
>
> I also vote for 19.
>
> Cheers,
> Jyhem
>
> On Tue, Nov 14, 2017 at 3:06 PM, Brendan Ferguson <drsassafras@gmail.com <mailto:drsassafras@gmail.com>> wrote:
> >
> > Then i remembered Brendan’s point about some languages then talking up even more space making the problem of, for instance, Japanese page names not fitting into the current field length...
>
> Right, but ascii characters only take up 1-byte. It’s only the emoji-etc stored in 4-bytes.... Except for indexes. We won’t truncate fields by converting, but it will mess with indexes, and some truncating will presumably happen there. Right now every index takes 3-bytes to store any character. After the change in very index will need 4-bytes to store any character.
>
> We could increase the size of each index by an additional 1/3, so the behavior won’t change, but many of our indexes are poorly chosen. Well chosen indexes should be increased. Absurdly long indexes might cause errors by exceeding character limits, so they all need to be checked as well.
>
> So ya, I would say 19 as well. There is no encoding conversion. By that I mean every character in UTF8 is stored with the exact same way as MB8. Its likely going to be a much smoother transition, but it’s going to effect a lot of data in diverse ways! A small unseen issue could cause massive disruption over large amounts of tiki.... But here’s hoping it all goes smoothly. I’m hoping that any issues caused will be less troublesome than the fatal error caused by emoji that we currently have. But still.
>
> Brendan
>
> >
> > Hmm - i think i’m still currently -1 on this for 18, it’s too close now and i’m scared of text encoding! :-)
> >
> > jonny
> >
> >
> >
> >> On 14 Nov 2017, at 10:12, Jonny Bradley <jonny@tiki.org <mailto:jonny@tiki.org>> wrote:
> >>
> >>
> >> Well, thinking about it what could be the danger for new installs? Presumably utf8mb4 works as well as old UTF-8 in normal (2 or 3 byte) circumstances, no?
> >>
> >> The only danger would be adding 4 byte chars to Tikis when we’re not sure which encoding they’re using, because for non-utf8mb4 ones we get a fatal error.
> >>
> >> So switching to this
> >>

> >>> On 13 Nov 2017, at 20:32, Cloutier, Philippe (DGARI-Consultant) <Philippe.Cloutier.externe@mern-mffp.gouv.qc.ca <mailto:Philippe.Cloutier.externe@mern-mffp.gouv.qc.ca>> wrote:

> >>>
> >>> Hi Alexander,
> >>>
> >>>> -----Message d’origine-----
> >>>> De : Alexander Mette mailto:mail@amette.eu <mailto:mail@amette.eu>
> >>>> Envoyé : 13 novembre 2017 15:08
> >>>> À : tikiwiki-devel at lists.sourceforge.net <mailto:tikiwiki-devel@lists.sourceforge.net>
> >>>> Objet : Re: Tiki-devel Use utf8mb4 as default character-set for DB
> >>>>
> >>>> Yes, even though this looks rather trivial, it still is encoding
> >>>> related. I too fear that it is too late for 18. Would have been a good
> >>>> change for an LTS though. :-/
> >>>>
> >>>> Would doing it only for fresh installs be an option!? I don’t know if
> >>>> there’s a rule or anything, but I think that the DB of a certain
> >>>> Tiki-version should always look the same no matter if it was a fresh
> >>>> install or an upgrade, no?
> >>>
> >>>
> >>> In some respects, but not on that specific point. The conversion from ISO-8859 to UTF-8 was offered “optionally” and I think installs which never converted are still supported, many years later.
> >>>
> >>>>
> >>>> regrds
> >>>> amette
> >>>>
> >>>> On Tue, 7 Nov 2017 12:54:14 +0100
> >>>> luciash <luci@tiki.org <mailto:luci@tiki.org>> wrote:
> >>>>
> >>>>> Yes, I remember that! :-p That is why I support only the idea to do
> >>>>> it just for fresh installs, not for upgrades please!
> >>>>>
> >>>>> luci
> >>>>>
> >>>>>
> >>>>>> On 7.11.2017 12:50, Jonny Bradley wrote:
> >>>>>> However, i suggest we leave this until after Tiki 18 - i think LTS
> >>>>>> is not a good time to experiment with character encoding in any
> >>>>>> way, some of us remember the great charset encoding disaster of
> >>>>>> Tiki 5!:)
> >>>>>>
> >>>>>> jb
> >>>>>
> >>>>>
> >>>>> ------------------------------------------------------------------------------
> >>>>> Check out the vibrant tech community on one of the world’s most
> >>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot <http://sdm.link/slashdot>
> >>>>> ___
> >>>>> TikiWiki-devel mailing list
> >>>>> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
> >>>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> https://amette.eu <https://amette.eu/>
> >>>> mail at amette.eu <mailto:mail@amette.eu>
> >>>>
> >>>> ------------------------------------------------------------------------------
> >>>> Check out the vibrant tech community on one of the world’s most
> >>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot <http://sdm.link/slashdot>
> >>>> ___
> >>>> TikiWiki-devel mailing list
> >>>> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
> >>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
> >>>
> >>> ------------------------------------------------------------------------------
> >>> Check out the vibrant tech community on one of the world’s most
> >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot <http://sdm.link/slashdot>
> >>> ___
> >>> TikiWiki-devel mailing list
> >>> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
> >>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
> >
> >
> > ------------------------------------------------------------------------------
> > Check out the vibrant tech community on one of the world’s most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot <http://sdm.link/slashdot>
> > ___
> > TikiWiki-devel mailing list
> > TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
> > https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world’s most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot <http://sdm.link/slashdot>
> ___
> TikiWiki-devel mailing list
> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world’s most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot_
> TikiWiki-devel mailing list
> TikiWiki-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel

posts: 757 Canada

https://en.wikipedia.org/wiki/Unicode_block <https://en.wikipedia.org/wiki/Unicode_block>

Everything in green is 4-byte.

Brendan



> On Nov 16, 2017, at 2:37 AM, Bernard Sfez <me@bsfez.com> wrote:
>
> Hi,
>
> I’m really flying over this discussion but just a note that Hebrew is 4bytes (last time I checked :-))
>
>
>> On 15 Nov 2017, at 04:35 , Jean-Marc Libs <jeanmarc.libs@gmail.com <mailto:jeanmarc.libs@gmail.com>> wrote:
>>
>> I also vote for 19.
>>
>> Cheers,
>> Jyhem
>>
>> On Tue, Nov 14, 2017 at 3:06 PM, Brendan Ferguson <drsassafras@gmail.com <mailto:drsassafras@gmail.com>> wrote:
>> >
>> > Then i remembered Brendan’s point about some languages then talking up even more space making the problem of, for instance, Japanese page names not fitting into the current field length...
>>
>> Right, but ascii characters only take up 1-byte. It’s only the emoji-etc stored in 4-bytes.... Except for indexes. We won’t truncate fields by converting, but it will mess with indexes, and some truncating will presumably happen there. Right now every index takes 3-bytes to store any character. After the change in very index will need 4-bytes to store any character.
>>
>> We could increase the size of each index by an additional 1/3, so the behavior won’t change, but many of our indexes are poorly chosen. Well chosen indexes should be increased. Absurdly long indexes might cause errors by exceeding character limits, so they all need to be checked as well.
>>
>> So ya, I would say 19 as well. There is no encoding conversion. By that I mean every character in UTF8 is stored with the exact same way as MB8. Its likely going to be a much smoother transition, but it’s going to effect a lot of data in diverse ways! A small unseen issue could cause massive disruption over large amounts of tiki.... But here’s hoping it all goes smoothly. I’m hoping that any issues caused will be less troublesome than the fatal error caused by emoji that we currently have. But still.
>>
>> Brendan
>>
>> >
>> > Hmm - i think i’m still currently -1 on this for 18, it’s too close now and i’m scared of text encoding! :-)
>> >
>> > jonny
>> >
>> >
>> >
>> >> On 14 Nov 2017, at 10:12, Jonny Bradley <jonny@tiki.org <mailto:jonny@tiki.org>> wrote:
>> >>
>> >>
>> >> Well, thinking about it what could be the danger for new installs? Presumably utf8mb4 works as well as old UTF-8 in normal (2 or 3 byte) circumstances, no?
>> >>
>> >> The only danger would be adding 4 byte chars to Tikis when we’re not sure which encoding they’re using, because for non-utf8mb4 ones we get a fatal error.
>> >>
>> >> So switching to this
>> >>

>> >>> On 13 Nov 2017, at 20:32, Cloutier, Philippe (DGARI-Consultant) <Philippe.Cloutier.externe@mern-mffp.gouv.qc.ca <mailto:Philippe.Cloutier.externe@mern-mffp.gouv.qc.ca>> wrote:

>> >>>
>> >>> Hi Alexander,
>> >>>
>> >>>> -----Message d’origine-----
>> >>>> De : Alexander Mette mailto:mail@amette.eu <mailto:mail@amette.eu>
>> >>>> Envoyé : 13 novembre 2017 15:08
>> >>>> À : tikiwiki-devel at lists.sourceforge.net <mailto:tikiwiki-devel@lists.sourceforge.net>
>> >>>> Objet : Re: Tiki-devel Use utf8mb4 as default character-set for DB
>> >>>>
>> >>>> Yes, even though this looks rather trivial, it still is encoding
>> >>>> related. I too fear that it is too late for 18. Would have been a good
>> >>>> change for an LTS though. :-/
>> >>>>
>> >>>> Would doing it only for fresh installs be an option!? I don’t know if
>> >>>> there’s a rule or anything, but I think that the DB of a certain
>> >>>> Tiki-version should always look the same no matter if it was a fresh
>> >>>> install or an upgrade, no?
>> >>>
>> >>>
>> >>> In some respects, but not on that specific point. The conversion from ISO-8859 to UTF-8 was offered “optionally” and I think installs which never converted are still supported, many years later.
>> >>>
>> >>>>
>> >>>> regrds
>> >>>> amette
>> >>>>
>> >>>> On Tue, 7 Nov 2017 12:54:14 +0100
>> >>>> luciash <luci@tiki.org <mailto:luci@tiki.org>> wrote:
>> >>>>
>> >>>>> Yes, I remember that! :-p That is why I support only the idea to do
>> >>>>> it just for fresh installs, not for upgrades please!
>> >>>>>
>> >>>>> luci
>> >>>>>
>> >>>>>
>> >>>>>> On 7.11.2017 12:50, Jonny Bradley wrote:
>> >>>>>> However, i suggest we leave this until after Tiki 18 - i think LTS
>> >>>>>> is not a good time to experiment with character encoding in any
>> >>>>>> way, some of us remember the great charset encoding disaster of
>> >>>>>> Tiki 5!:)
>> >>>>>>
>> >>>>>> jb
>> >>>>>
>> >>>>>
>> >>>>> ------------------------------------------------------------------------------
>> >>>>> Check out the vibrant tech community on one of the world’s most
>> >>>>> engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sdm.link/slashdot <http://sdm.link/slashdot>
>> >>>>> ___
>> >>>>> TikiWiki-devel mailing list
>> >>>>> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
>> >>>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> https://amette.eu <https://amette.eu/>
>> >>>> mail at amette.eu <mailto:mail@amette.eu>
>> >>>>
>> >>>> ------------------------------------------------------------------------------
>> >>>> Check out the vibrant tech community on one of the world’s most
>> >>>> engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sdm.link/slashdot <http://sdm.link/slashdot>
>> >>>> ___
>> >>>> TikiWiki-devel mailing list
>> >>>> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
>> >>>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>> Check out the vibrant tech community on one of the world’s most
>> >>> engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sdm.link/slashdot <http://sdm.link/slashdot>
>> >>> ___
>> >>> TikiWiki-devel mailing list
>> >>> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
>> >>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Check out the vibrant tech community on one of the world’s most
>> > engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sdm.link/slashdot <http://sdm.link/slashdot>
>> > ___
>> > TikiWiki-devel mailing list
>> > TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
>> > https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world’s most
>> engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sdm.link/slashdot <http://sdm.link/slashdot>
>> ___
>> TikiWiki-devel mailing list
>> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel <https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world’s most
>> engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sdm.link/slashdot_ <http://sdm.link/slashdot___>
>> TikiWiki-devel mailing list
>> TikiWiki-devel at lists.sourceforge.net <mailto:TikiWiki-devel@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world’s most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot_
> TikiWiki-devel mailing list
> TikiWiki-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel


Why Register?

Register at tiki.org and you'll be able to use the account at any *.tiki.org site, thanks to the InterTiki feature. A valid email address is required to receive site notifications and occasional newsletters. You can opt out of these items at any time.