API Reference

This page contains the documentation of all the members exported by toolforge_i18n. Not all of them are useful to tools; some should arguably not really be used by any tools. Members are also not listed in any useful order. For these reasons, it doesn’t really make sense to read through this document front to back. Please follow the other documentation instead, and consult this page only if you are interested in details on a particular member.

class toolforge_i18n.CommaSeparatedListFormatter(*, locale_identifier: str, **kwargs: object)[source]

A string formatter supporting a !l list conversion.

Format string example:

"We went to {cities!l}."

For iterable values converted with !l, the format spec is applied to each list element. Afterwards, the list elements are joined into a standard list using the locale specified in the constructor. (For English, this means separating most items with an ASCII comma plus a space, and the final two with an extra “and”; Chinese and Japanese, for instance, use a fullwidth comma instead.) Attempting to convert non-iterable values with !l is an error.

class toolforge_i18n.GenderFormatter(*, get_gender: Callable[[object], Literal['m', 'f', 'n']], **kwargs: object)[source]

A string formatter supporting a !g grammatical gender conversion.

Format string examples:

"Leave a message on {user!g:m=his:f=her:n=their} talk page."
"Ci dispiace, ma non sei {user!g:m=autorizzato:f=autorizzata:n=autorizzato/a} a usare il caricamento di massa."

The formatted value, which can be anything as far as this formatter is concerned, is passed into a function specified in the constructor, which should return one of the values "m", "f", or "n", to select the grammatically masculine, feminine, or neutral replacement, respectively. The format spec specifies these three replacements separated by colons. Gender values not specified in the format spec fall back to "m".

class toolforge_i18n.HyperlinkFormatter(**_kwargs: object)[source]

A string formatter supporting an !h hyperlink conversion.

Format string example:

"You need to {url!h:log in} before you can edit."

The formatted value is interpreted as the href attribute of an HTML <a> element, whose inner HTML is given by the format spec.

class toolforge_i18n.I18nFormatter(*, locale_identifier: str, **kwargs: object)[source]

A string formatter supporting !p (plural), !l (list), !g (gender) and !h (hyperlink) conversions.

See PluralFormatter, CommaSeparatedListFormatter, HyperlinkFormatter and GenderFormatter for details.

Flask-based tools don’t need to use this class directly (it’s used by message).

class toolforge_i18n.PluralFormatter(*, locale_identifier: str, **kwargs: object)[source]

A string formatter supporting a !p plural conversion.

Format string examples:

"I ate {count!p:0=no apples:one={count} apple:other={count} apples}."
"{size} (0x{size:04X}) {size!p:one=bajt:two=bajtaj:few=bajty:other=bajtow}"

For numeric values converted with !p, the format spec is interpreted differently: it consists of a set of key=text specs, separated by colons. The key should be one of the CLDR plural rule tags, currently “zero”, “one”, “two”, “few”, “many”, or “other”, or an explicit value. The text for the matching value or tag, according to the plural rules of the locale specified in the constructor, is substituted into the message. Attempting to convert non-numeric values with !p is an error.

Note that most languages do not use all possible tags, and only exactly those tags used in a language should occur in the format string. For example, even though there is a “zero” tag, English only uses the “one” and “other” ones, and to make a special case for a value of zero with a PluralFormatter('en'), you need to use the key “0”, not “zero”. On the other hand, failing to specify all tags used in a language may make the formatter raise a KeyError: for instance, if the first example above used the key “1” instead of “one”, then it would fail when given a count of -1 or 1.0.

Value keys always take precedence over tag keys, no matter in which order they are specified in the format spec. To match the value, they must be identical to the str() of the value: for instance, a “1” key will not match a 1.0 value or vice versa.

class toolforge_i18n.ToolforgeI18n(app: ~flask.app.Flask | None = None, interface_language_code: ~collections.abc.Callable[[dict[str, dict[str, str]]], str] = <function interface_language_code_from_request>)[source]

Flask extension for toolforge_i18n.

Basic usage:

app = flask.Flask(__name__)
i18n = ToolforgeI18n(app)
class toolforge_i18n.TranslationsConfig(directory: str = 'i18n/', variables: ~collections.abc.Mapping[str, ~collections.abc.Sequence[str]] = <factory>, derived_messages: ~collections.abc.Mapping[str, tuple[str, ~collections.abc.Callable[[str], str]]] = <factory>, language_code_to_babel: ~collections.abc.Callable[[str], str] = <function language_code_to_babel>, allowed_html_elements: dict[str, set[str]] = <factory>, allowed_global_attributes: set[str] = <factory>, get_gender: ~collections.abc.Callable[[~typing.Any], ~typing.Literal['m', 'f', 'n']] = <function get_gender_by_user_name>, check_translations: bool = True)[source]

Configuration for loading message translations.

To use this library, a tool should define a tool_translations_config module which exports a config member of this type, like so:

# tool_translations_config.py
import TranslationsConfig from toolforge_i18n
config = TranslationsConfig(
    # ...
)

The most important config to define is variables, which most tools will need (unless all your messages have no variables); the others may or may not be necessary depending on the tool.

allowed_global_attributes: set[str]

HTML attributes that should be allowed on any element in messages.

This is similar to allowed_html_elements, but the given attribute names are allowed regardless of element name.

allowed_html_elements: dict[str, set[str]]

HTML elements that should be allowed in messages.

The key is an element name, and the value is a set of attributes that are allowed on that element. All other elements and attributes will cause a test failure. (See also allowed_global_attributes.)

check_translations: bool = True

Whether to check translations when they are loaded.

By default, translations are checked as soon as they are loaded, and if there is a problem with the translations, an error is raised and the translations cannot be used. (This generally means that the tool cannot run; you will probably have to revert the latest localisation updates and fix the translation on translatewiki.net.) This protects against broken or even malicious messages.

If you have set up Continuous Integration (CI), e.g. using GitLab CI or GitHub actions, and you are running pytest as part of your tests, then the translations checks will also be registered as tests (you should see various i18n/*.json files in pytest’s output). In this case, assuming CI also runs on translatewiki.net exports (and you won’t merge any localisation updates where CI fails), you can set this config to False to disable the runtime checks; this will speed up translation loading and therefore the tool’s startup (for a well-translated tool, by more than a second).

Beware that, if you set this to False, only the pytest integration in CI protects your tool from malicious translations. You must be confident that CI will run, and will run all pytest tests, and if possible you should configure your repository so that localisation updates cannot be merged if CI fails (in GitLab: Settings > Merge requests > Pipelines must succeed; no direct equivalent in GitHub). Otherwise, it is always safe to leave this set to True.

derived_messages: Mapping[str, tuple[str, Callable[[str], str]]]

Messages that are derived from other messages.

The key is a message key that is not expected in the JSON files, but that is instead generated by taking another message (whose key is the first element of the tuple) and sending it through the callable in the second element of the tuple. Examples for that callable include the identity function (to copy a message) or simple case transformations.

directory: str = 'i18n/'

The path to the directory to load message files from.

get_gender() Literal['m', 'f', 'n']

Get the gender of a named user on Wikimedia sites.

This gets the gender from Meta-Wiki – hopefully the user set it as a global preference, not just on one other wiki. None may be used to represent an unknown user (e.g. not logged in), who will be treated as having neuter gender.

language_code_to_babel() str

Default implementation to map a MediaWiki language code to Babel.

This implementation is conservative and only maps language codes where Babel has an alternative that does not lose any information (at least as far as toolforge_i18n is concerned). MediaWiki also supports many language codes that (as far as I know) have no lossless equivalent in Babel, such as (at the time of writing) sh-latn (Serbo-Croatian in Latin script). If your tool is translated into one of those languages, you will have to configure a custom language_code_to_babel implementation in your tool_translations_config and pick some lossy fallback (e.g. hr, Croatian, for sh-latn). Your implementation should generally delegate to this one first, for instance:

def language_code_to_babel(code: str) -> str:
    mapped = toolforge_i18n.language_code_to_babel(code)
    if mapped != code:
        return mapped
    return {
        'sh-latn': 'hr',
        # ...
    }.get(code, code.partition('-')[0])
variables: Mapping[str, Sequence[str]]

Variable names used in messages.

The source messages use $1, $2 etc., but the Python format strings use named variables, whose names are specified here. The variable name (or its prefix) encodes the type:

  • url, url_* - hyperlink: [$1 text] => {url!h:text}

  • user_name, user_name_* - gender: {{GENDER:$1|he|she|they}} => {user_name!g:m=he:f=she:n=they}

  • num, num_* - plural: {{PLURAL:$1|one egg|$1 eggs}} => {num_eggs!p:one=one egg:other={num_eggs} eggs}

  • list, list_* - list: $1 => {list_chicken_names!l}

  • anything else - markup without further formatting: $1 => {description}

exception toolforge_i18n.UnknownMessageWarning(message_code: str, language_codes: list[str])[source]

Warning issued by message() when a message is not defined.

This warning usually indicates one of two problems:

  1. a typo in the message key (whether in the message() call or in i18n/en.json), or

  2. a message that was not added to i18n/en.json yet.

toolforge_i18n.add_lang_if_needed(message: Markup, language_code: str) Markup[source]

Wrap the given message in a language-tagged element if necessary.

Given a (formatted) message in a certain language (MediaWiki language code), wrap it in a <span> with lang= and dir= attributes if the current language on top of the stack is different. Note that message() calls this function automatically, so you generally don’t need to use this function yourself.

toolforge_i18n.get_gender_by_user_name(user_name: str | None) Literal['m', 'f', 'n'][source]

Get the gender of a named user on Wikimedia sites.

This gets the gender from Meta-Wiki – hopefully the user set it as a global preference, not just on one other wiki. None may be used to represent an unknown user (e.g. not logged in), who will be treated as having neuter gender.

toolforge_i18n.get_user_agent() str[source]

Get the user agent string used by toolforge_i18n.

The user agent string may be set by set_user_agent(); otherwise, try to get a user agent previously set up by toolforge.set_user_agent().

Code outside of toolforge_i18n generally shouldn’t use this function.

toolforge_i18n.interface_language_code_from_request(translations: dict[str, dict[str, str]]) str[source]

Default implementation to determine the language code of a request.

This function supports the ?uselang= URL parameter and otherwise determines the language based on the request’s Accept-Language header. You may want to override this method to implement a persistent language preference; to keep the features mentioned above, your implementation should generally look like this:

from toolforge_i18n import interface_language_code_from_request

def interface_language_code(translations):
    # ?uselang= takes precedence if present
    if 'uselang' in flask.request.args:
        return interface_language_code_from_request(translations)
    # try persistent language preference (e.g. from flask.session) next
    # ...
    # finally, fall back to Accept-Language:
    return interface_language_code_from_request(translations)

# ...later, pass the implementation into ToolforgeI18n:
i18n = ToolforgeI18n(app, interface_language_code)
toolforge_i18n.lang_autonym(code: str) str | None[source]

Get the autonym of the given language code, according to MediaWiki.

toolforge_i18n.lang_bcp47_to_mw(code: str) str[source]

Get the MediaWiki language code of the given BCP-47 language code.

toolforge_i18n.lang_dir(code: str) Literal['ltr', 'rtl', 'auto'][source]

Get the directionality of the given language code, according to MediaWiki.

toolforge_i18n.lang_fallbacks(code: str) list[str][source]

Get the fallback languages of the given language code, according to MediaWiki.

toolforge_i18n.lang_mw_to_bcp47(code: str) str[source]

Get the BCP-47 language code of the given MediaWiki language code.

toolforge_i18n.language_code_to_babel(code: str) str[source]

Default implementation to map a MediaWiki language code to Babel.

This implementation is conservative and only maps language codes where Babel has an alternative that does not lose any information (at least as far as toolforge_i18n is concerned). MediaWiki also supports many language codes that (as far as I know) have no lossless equivalent in Babel, such as (at the time of writing) sh-latn (Serbo-Croatian in Latin script). If your tool is translated into one of those languages, you will have to configure a custom language_code_to_babel implementation in your tool_translations_config and pick some lossy fallback (e.g. hr, Croatian, for sh-latn). Your implementation should generally delegate to this one first, for instance:

def language_code_to_babel(code: str) -> str:
    mapped = toolforge_i18n.language_code_to_babel(code)
    if mapped != code:
        return mapped
    return {
        'sh-latn': 'hr',
        # ...
    }.get(code, code.partition('-')[0])
toolforge_i18n.load_translations(config: TranslationsConfig) tuple[dict[str, dict[str, str]], dict[str, str]][source]

Load the translations according to the given config.

Returns a tuple of translations, documentation where translations is a nested dict from language code to message key to message, and documentation is a dict from message key to message documentation. The messages in translations are Python format strings intended to be formatted by I18nFormatter.

If check_translations is enabled in the config, the translation checks are run before this function returns, ensuring that the translations are safe to use.

Flask-based tools don’t need to call this function directly (it’s called by ToolforgeI18n).

toolforge_i18n.message(message_code: str, **kwargs: object) Markup[source]

Format an interface message in the user interface language.

The kwargs may contain (named) arguments, using the argument names defined in variables.

This method is available as a template global, and is usually used there (but may also be imported and called from Python code).

toolforge_i18n.pop_html_lang(language_code: str) Markup[source]

Pop an HTML language code from the stack.

See push_html_lang() for details.

toolforge_i18n.push_html_lang(language_code: str) Markup[source]

Push an HTML language code to the stack.

Many tools will not need to call this, as it’s called by the message() function automatically. However, if you also add localized text from other sources than messages, you should call this function with the MediaWiki language code you are using; for example, in a Jinja2 template:

<span {{ push_html_lang(label.language) }}>
  {{ label.value }}
</span{{ pop_html_lang(label.language) }}>
toolforge_i18n.set_user_agent(user_agent: str) None[source]

Set the user agent string used by toolforge_i18n.

Most tools should call toolforge.set_user_agent() instead, which also sets the user agent for other code. It is typically called during early initialization.

See the User-Agent policy for the format.