derelict.utf8proc.statfun

Members

Functions

utf8proc_NFC
utf8proc_uint8_t* utf8proc_NFC(const utf8proc_uint8_t* str)

NFC normalization (@ref UTF8PROC_COMPOSE).

utf8proc_NFD
utf8proc_uint8_t* utf8proc_NFD(const utf8proc_uint8_t* str)

NFD normalization (@ref UTF8PROC_DECOMPOSE).

utf8proc_NFKC
utf8proc_uint8_t* utf8proc_NFKC(const utf8proc_uint8_t* str)

NFKC normalization (@ref UTF8PROC_COMPOSE and @ref UTF8PROC_COMPAT).

utf8proc_NFKD
utf8proc_uint8_t* utf8proc_NFKD(const utf8proc_uint8_t* str)

NFKD normalization (@ref UTF8PROC_DECOMPOSE and @ref UTF8PROC_COMPAT).

utf8proc_category
utf8proc_category_t utf8proc_category(utf8proc_int32_t codepoint)

Return the Unicode category for the codepoint (one of the @ref utf8proc_category_t constants.)

utf8proc_category_string
const(char)* utf8proc_category_string(utf8proc_int32_t codepoint)

Return the two-letter (nul-terminated) Unicode category string for the codepoint (e.g. "Lu" or "Co").

utf8proc_charwidth
int utf8proc_charwidth(utf8proc_int32_t codepoint)

Given a codepoint, return a character width analogous to wcwidth(codepoint), except that a width of 0 is returned for non-printable codepoints instead of -1 as in wcwidth.

utf8proc_codepoint_valid
utf8proc_bool utf8proc_codepoint_valid(utf8proc_int32_t codepoint)

Check if a codepoint is valid (regardless of whether it has been assigned a value by the current Unicode standard).

utf8proc_decompose
utf8proc_ssize_t utf8proc_decompose(const utf8proc_uint8_t* str, utf8proc_ssize_t strlen, utf8proc_int32_t* buffer, utf8proc_ssize_t bufsize, utf8proc_option_t options)

The same as @ref utf8proc_decompose_char, but acts on a whole UTF-8 string and orders the decomposed sequences correctly.

utf8proc_decompose_char
utf8proc_ssize_t utf8proc_decompose_char(utf8proc_int32_t codepoint, utf8proc_int32_t* dst, utf8proc_ssize_t bufsize, utf8proc_option_t options, int* last_boundclass)

Decompose a codepoint into an array of codepoints.

utf8proc_decompose_custom
utf8proc_ssize_t utf8proc_decompose_custom(const utf8proc_uint8_t* str, utf8proc_ssize_t strlen, utf8proc_int32_t* buffer, utf8proc_ssize_t bufsize, utf8proc_option_t options, utf8proc_custom_func custom_func, void* custom_data)

The same as @ref utf8proc_decompose, but also takes a custom_func mapping function that is called on each codepoint in str before any other transformations (along with a custom_data pointer that is passed through to custom_func). The custom_func argument is ignored if it is NULL. See also @ref utf8proc_map_custom.

utf8proc_encode_char
utf8proc_ssize_t utf8proc_encode_char(utf8proc_int32_t codepoint, utf8proc_uint8_t* dst)

Encodes the codepoint as an UTF-8 string in the byte array pointed to by dst. This array must be at least 4 bytes long.

utf8proc_errmsg
const(char)* utf8proc_errmsg(utf8proc_ssize_t errcode)

Returns an informative error string for the given utf8proc error code (e.g. the error codes returned by @ref utf8proc_map).

utf8proc_get_property
const(utf8proc_property_t)* utf8proc_get_property(utf8proc_int32_t codepoint)

Look up the properties for a given codepoint.

utf8proc_grapheme_break
utf8proc_bool utf8proc_grapheme_break(utf8proc_int32_t codepoint1, utf8proc_int32_t codepoint2)

Same as @ref utf8proc_grapheme_break_stateful, except without support for the Unicode 9 additions to the algorithm. Supported for legacy reasons.

utf8proc_grapheme_break_stateful
utf8proc_bool utf8proc_grapheme_break_stateful(utf8proc_int32_t codepoint1, utf8proc_int32_t codepoint2, utf8proc_int32_t* state)

Given a pair of consecutive codepoints, return whether a grapheme break is permitted between them (as defined by the extended grapheme clusters in UAX#29).

utf8proc_iterate
utf8proc_ssize_t utf8proc_iterate(const utf8proc_uint8_t* str, utf8proc_ssize_t strlen, utf8proc_int32_t* codepoint_ref)

Reads a single codepoint from the UTF-8 sequence being pointed to by str. The maximum number of bytes read is strlen, unless strlen is negative (in which case up to 4 bytes are read).

utf8proc_map
utf8proc_ssize_t utf8proc_map(const utf8proc_uint8_t* str, utf8proc_ssize_t strlen, utf8proc_uint8_t** dstptr, utf8proc_option_t options)

Maps the given UTF-8 string pointed to by str to a new UTF-8 string, allocated dynamically by malloc and returned via dstptr.

utf8proc_map_custom
utf8proc_ssize_t utf8proc_map_custom(const utf8proc_uint8_t* str, utf8proc_ssize_t strlen, utf8proc_uint8_t** dstptr, utf8proc_option_t options, utf8proc_custom_func custom_func, void* custom_data)

Like @ref utf8proc_map, but also takes a custom_func mapping function that is called on each codepoint in str before any other transformations (along with a custom_data pointer that is passed through to custom_func). The custom_func argument is ignored if it is NULL.

utf8proc_normalize_utf32
utf8proc_ssize_t utf8proc_normalize_utf32(utf8proc_int32_t* buffer, utf8proc_ssize_t length, utf8proc_option_t options)

Normalizes the sequence of length codepoints pointed to by buffer in-place (i.e., the result is also stored in buffer).

utf8proc_reencode
utf8proc_ssize_t utf8proc_reencode(utf8proc_int32_t* buffer, utf8proc_ssize_t length, utf8proc_option_t options)

Reencodes the sequence of length codepoints pointed to by buffer UTF-8 data in-place (i.e., the result is also stored in buffer). Can optionally normalize the UTF-32 sequence prior to UTF-8 conversion.

utf8proc_tolower
utf8proc_int32_t utf8proc_tolower(utf8proc_int32_t c)

Given a codepoint c, return the codepoint of the corresponding lower-case character, if any; otherwise (if there is no lower-case variant, or if c is not a valid codepoint) return c.

utf8proc_totitle
utf8proc_int32_t utf8proc_totitle(utf8proc_int32_t c)

Given a codepoint c, return the codepoint of the corresponding title-case character, if any; otherwise (if there is no title-case variant, or if c is not a valid codepoint) return c.

utf8proc_toupper
utf8proc_int32_t utf8proc_toupper(utf8proc_int32_t c)

Given a codepoint c, return the codepoint of the corresponding upper-case character, if any; otherwise (if there is no upper-case variant, or if c is not a valid codepoint) return c.

utf8proc_version
const(char)* utf8proc_version()

Returns the utf8proc API version as a string MAJOR.MINOR.PATCH (http://semver.org format), possibly with a "-dev" suffix for development versions.

Variables

utf8proc_utf8class
utf8proc_int8_t[256] utf8proc_utf8class;

Array containing the byte lengths of a UTF-8 encoded codepoint based on the first byte.

Meta