Development of Tiki's international possibilities

Any relevant info about improving Tiki features should be moved to http://dev.tiki.org/i18n

Status/RoadMap

Where are we?
- improvment

Even if it possible to translate some strings with variable (example: Hello {$username}{/tr} with in language.php file "Hello {\$username}"=>"Bonjour {\$username}", all configurations are not possible.
For instance "Hello {$title} {$username}{/tr}". ($title needs a translation, not $username).

One idea to solve this is to add parameters to the tr function.
- Example "$title{/tr}}Hello %t {$username}{/tr}" and a language file "Hello Mr {\$username}"
other idea?

some language needs UTF-16.
- What languages should that be? As far as I know, then a language cannot need a specific encoding such as UTF-8 or UTF-16. All you have to decide is how you want to represent your (Unicode) characters and UTF-8 is capable of representing all characters found in Unicode. See CharacterEncodingTrouble for more about the same. --gimpster
- do we set a global parameter for the encoding?

Calendar
- Internationalize Calendar module
- The first day of the week in French is Monday

RFEs

WikiWords generation for non-anglophone characters
Cut language.php's size for lowering memory usage, see botttom of page for more.

Competition and standards

List of other products with similar/interesting/related features.

Here I would like to see some "editorial" content. How do our features compare to others?

CVS Doc section

This is where new features being developed and only in CVS are documented. When the CVS becomes RC/official release, the info in the CVS docs is transferred to update the official docs (FeatureXDoc).

Discussion/participation

language.php's memory usage

mose said :the language.php is so heavy that it's one reason we need huge amount of memory for tiki. (sylvie: 228KB .... is it really bigcompared to the 5MB - but optimization is everywhere) Maybe that file cane be regenerated by admin with only the selected options translated, to a language_light.php. That would imply to declare somewhere the list of tpl for each feature, but imho it's not so a huge work when that first step is done.

Chealer9 adds :
I was not aware of this issue. Here's an idea this gave me. I'm highly unsure about it and would like your comments. I am Not proposing myself for this, it's just a brain game.
Well, I had more ideas since, I'll write it in steps so it's easier to follow.

Replacing english strings in tpls with numbers used to specify which line in language.php corresponds with translation. This slightly cuts tpls's size but reduces language.phps' size by half (for memory). This must be nearly as long as writing a new core though, and it means that strings changes or additions would take long.
Adding both a line number and the english equivalent makes tpls nearly as easy to read as before. But it's still long to change translations, so instead we can proceed just as we do now and add the line number periodically running a script similar to get-strings.
Actually, the english corrections can be made directly in template. It can be done either by periodically running a script to watch if all the strings inside s still match their associated number and if not updating the english language.php. It would probably be more efficient to rely on something watching CVS commits to run the script on the file immediately.

Here are associated advantages :

I guess we could make spotting the context (discussed higher in trackers) pretty easy keeping the string number unique even for identical strings.
Actually this can push mose's idea of language_light to its limit since Smarty could first find all line numbers it has to translate and then only read those. I think it also saves the admin to compile files ans us to associate each file with a feature. This would mean a nearly inexistant memory usage of translation
We could currently save about 1.5 MB from the uncompressed Tiki size removing the english strings. I guess this would make fixing the english in tpls less annoying for translators too.
I don't know much about the email notifications' linguistic problem, but I guess the change would make it much easier to solve.

Assuming someone codes this 🙄 it would leave three problems :

Development is a bit complicated. That's yet another thing you have to understand.
Also those two spam issues
- line numbers in tpls
- for each CVS commit changing tpl strings, a bot would commit another version with line numbers.

I realize this is not really a Tiki only thing, maybe others already have something like that...Smarty? At least, if not then coding it could probably be contributed to Smarty. Also, I totally forgot about PHP's tra() while writing this and I just hope it can still work with that. This can really contain errors, feel free to correct me.

Update : I attached a discussion I had at #smarty with other devs. One talked about gettext, I wonder if we can use it optionally. It was supposed to be useful for all those advantages except context.

Update 2 : Sylvie came with a link to Smarty gettext today on IRC without much feedback, but it sounds good 😊

Update3 : sylvie: In 1.9 I took away a couple of include(language.php). Now (I think) the include is done only when it is really necessary. If the tpl files are precompiled, the include(language.php) is only done by the tra calls.
=> to optimize the perf:

transform tra into tr: can be done in some cases
include a language_tra.php in the tra function that includes only the tra strings (around 700 strings on a total of 4100) - that can be interesting for memory also
develop a more efficient access to the strings (that is what gettext is doing)

The creation of the language_tra.php can be done when you compile all the tpls.

Remove bloat from language.php

Observation

As we see on i18n status, many translations are incomplete. I have in mind two patches submitted to SourceForge, one proposing Ukrainian with 30% completion, the other proposing Korean with 10% completion. I hesitate to include those, since Tiki's default lang folder already eats about 4 MB, and those languages are not even complete.

Proposed improvement

Why not add something to the release script that would remove untouched strings from (incomplete) language.php-s? The package's size would be reduced, but also the memory consumption of those languages. This sounds quite easy to code, and it would make include partial translations much more reasonable to consider.

Chealer9

What about the question that Isam Bayazidi posed about RTL languages, ones that are read right-to-left, and the benefit of right justification? He wanted a bi-lingual site, i.e. Arabic and English. — UserPagejcwinnie 2004-May-17 -

a partial solution in i18n status right to left langauge paragraph - sylvie

tr tokens proposal

The following mechanism (or something very similar) is used in most popular operating systems - it could be added to Tiki with minimal impact on the Tiki environment.

Syntax

default text{/tr}

Either token or default text must be supplied. Supplying both is acceptable.

New message string format:

<pre> <token> ::= <tokchar> { <tokchar> }* <tokchar> ::= <char> | <wildcard> <char> ::= any character > code 31 (' ') except ',', ')', ':', '?' or '/' <wildcard> ::= '?' (matches any character) <value> ::= { <anychar> | '{$variable}' | '{{' }* </pre>

Spaces may only appear in <value>

Alternative tokens

Alternative tokens are seperated by '/'.

Wildcards

The '?' character in a token in the file matches any character in the same position in the token supplied to be matched.

Case significance

Case is significant.

Unmatchable tokens

The actions to find a match for a token are:

search the appropriate language file using token as the key
search the appropriate language file using default text (if supplied) as the key
use the default text (if supplied) as the translated text
generate a error

Note: there is no attempt to check the default language file. It is expected that the default text will always be supplied

Namespaces

Initially the extended features of tr would only be used for text labels and button text. The token always begins with:

token	description
button.	for button text
label.	for label text

Examples: button.edit label.user_flip_modules

Supplying a default string

Whenever you use Users can Shade Modules:{/tr}

Errors

Message token {$token} not found for language {$lang}

get_strings.php

get_strings.php would need to be extended to understand the new syntax and do the right thing.

is there a better way to represnet the messages file as an associative array, while keeping the flexibility of many:1 tokens:strings relationships
how do we make this work with database-driven translations?

I don't like at all the notion of token. Thinking that with token you can change English translation without reviewing the other translations is wrong. If the English changes, the others have to change. I don't think about typo fixes, and sed is easy to do, a php can be also very easy to do (if the language file are writteable) sylvie

I understand your point, but I disagree. Firstly, if a button is renamed from 'remove' to 'delete' there is no need to update the language translations. Secondly you won't catch the changed syntax if a developer makes the button labelled 'remove' no longer delete, but instead archive an artifact (but doesn't update the button label). Aside would it be spotted if a developer changed a button text and updated all the language files at the same time (to use the same translation with the new string?). — mdavey

perhaps tokens should only be used in limited circumstances, to add context to short strings such as button labels? mdavey

The major I see now is:

- to be able to specify the context af a translation.
The other I had in the chart feature "perm" that is not perms but the abbreviation of "permancency"
- to optimize the access to the tra strings sylvie

So I propose

string{/tr}
the context will be added by each translator each time he has some problem. For an abbreviation it will be the complete string. For a table header, it will be the indication "header".
get_strings will have to create the lines
//"context#string" => "translatedstring"
The translator can uncomment the line and adds his translation if needed sylvie

For the optimisation I propose to compile the remaining strings in a (md5, translatedstring) to speed the process. sylvie
^

	T	Name	Size	Created / Uploaded	Files	Hits
		Smarty Log of discussion of Chealer9 at #smarty	7.74 KB	20 Nov 2023		1360

1)	15 May 2025 14:00 GMT-0000 Tiki Roundtable Meeting - May 2025
2)	19 Jun 2025 14:00 GMT-0000 Tiki Roundtable Meeting - Juin 2025
3)	17 Jul 2025 14:00 GMT-0000 Tiki Roundtable Meeting - July 2025
4)	21 Aug 2025 14:00 GMT-0000 Tiki Roundtable Meeting - August 2025
5)	18 Sep 2025 14:00 GMT-0000 Tiki Roundtable Meeting - September 2025
6)	08 Oct 2025 Tiki birthday
7)	16 Oct 2025 14:00 GMT-0000 Tiki Roundtable Meeting - October 2025
8)	20 Nov 2025 14:00 GMT-0000 Tiki Roundtable Meeting - November 2025
9)	18 Dec 2025 14:00 GMT-0000 Tiki Roundtable Meeting - December 2025

I18nDev

RFEs

language.php's memory usage

Remove bloat from language.php

Observation

Proposed improvement

tr tokens proposal

Syntax

New message string format:

Alternative tokens

Wildcards

Case significance

Unmatchable tokens

Namespaces

Supplying a default string

Errors

get_strings.php

About Tiki

Support

Community

Documentation

Development

Legal

Tiki Project Sites

Networks

Navigation and related functionality and content

Related content

Custom Share Module 0.1dev

Backlinks

Page actions

I18nDev

RFEs

language.php's memory usage

Remove bloat from language.php

Observation

Proposed improvement

tr tokens proposal

Syntax

New message string format:

Alternative tokens

Wildcards

Case significance

Unmatchable tokens

Namespaces

Supplying a default string

Errors

get_strings.php