utf8proc_option_t

Option flags used by several functions in the library.

Values

ValueMeaning
UTF8PROC_NULLTERM(1 << 0)

The given UTF-8 input is NULL terminated.

UTF8PROC_STABLE(1 << 1)

Unicode Versioning Stability has to be respected.

UTF8PROC_COMPAT(1 << 2)

Compatibility decomposition (i.e. formatting information is lost).

UTF8PROC_COMPOSE(1 << 3)

Return a result with decomposed characters.

UTF8PROC_DECOMPOSE(1 << 4)

Return a result with decomposed characters.

UTF8PROC_IGNORE(1 << 5)

Strip "default ignorable characters" such as SOFT-HYPHEN or ZERO-WIDTH-SPACE.

UTF8PROC_REJECTNA(1 << 6)

Return an error, if the input contains unassigned codepoints.

UTF8PROC_NLF2LS(1 << 7)

Indicating that NLF-sequences (LF, CRLF, CR, NEL) are representing a line break, and should be converted to the codepoint for line separation (LS).

UTF8PROC_NLF2PS(1 << 8)

Indicating that NLF-sequences are representing a paragraph break, and should be converted to the codepoint for paragraph separation (PS).

UTF8PROC_NLF2LF(UTF8PROC_NLF2LS | UTF8PROC_NLF2PS)

Indicating that the meaning of NLF-sequences is unknown.

UTF8PROC_STRIPCC(1 << 9)

Strips and/or convers control characters.

NLF-sequences are transformed into space, except if one of the NLF2LS/PS/LF options is given. HorizontalTab (HT) and FormFeed (FF) are treated as a NLF-sequence in this case. All other control characters are simply removed.

UTF8PROC_CASEFOLD(1 << 10)

Performs unicode case folding, to be able to do a case-insensitive string comparison.

UTF8PROC_CHARBOUND(1 << 11)

Inserts 0xFF bytes at the beginning of each sequence which is representing a single grapheme cluster (see UAX#29).

UTF8PROC_LUMP(1 << 12)

Lumps certain characters together.

E.g. HYPHEN U+2010 and MINUS U+2212 to ASCII "-". See lump.md for details.

If NLF2LF is set, this includes a transformation of paragraph and line separators to ASCII line-feed (LF).

UTF8PROC_STRIPMARK(1 << 13)

Strips all character markings.

This includes non-spacing, spacing and enclosing (i.e. accents). @note This option works only with @ref UTF8PROC_COMPOSE or @ref UTF8PROC_DECOMPOSE

Meta