UTF8
Operations on UTF8 strings.
Most of the operations are optimized for speed so they might still
succeed on some malformed UTF8 string. The only function that completely
check the UTF8 format is utf8_validate. Other functions might raise
some exception or not depending on the malformed data.
'ubuf utf8_buf_alloc(size : int) Create a new buffer with an initial size in bytes
void utf8_buf_add('buf, int) Add a valid UTF8 char (0 - 0x10FFFF) to the buffer
string utf8_buf_content('buf)
Return the current content of the buffer.
This is not a copy of the buffer but the shared content.
Retreiving content and then continuing to add chars is
possible but not very efficient.
int utf8_buf_length('buf) Return the number of UTF8 chars stored in the buffer
int utf8_buf_size('buf) Return the current size in bytes of the buffer
bool utf8_validate(string) Validate if a string is encoded using the UTF8 format
int utf8_length(string) Returns the number of UTF8 chars in the string.
string utf8_sub(string, pos : int, len : int) Returns a part of an UTF8 string.
int utf8_get(string, n : int) Returns the nth char in an UTF8 string.
This might be inefficient if n is big.
void utf8_iter(string, f : ((int -> void))) Call f with each of UTF8 char of the string.
int utf8_compare(s1 : string, s2 : string) Compare two UTF8 strings according to UTF8 char codes.