Aros/Developer/Docs/Libraries/Codesets
< Aros | Developer/Docs
Introduction
Character set (charsets) encoding is the process of assigning numbers to graphical characters, especially the written characters of human language
Unicode v16.0 emojis are not supported but codesets.library provides
internally supported (hardcoded) charsets/codesets are: (conversions are possible from and to each codeset):
AmigaPL – Polish (Amiga)
Amiga-1251 – Cyrillic (Amiga)
ISO-8859-1 – Western European aka Latin alphabet no. 1 ASCII based
ISO-8859-1+Euro – West European (with EURO)
ISO-8859-2 – Central/East European
ISO-8859-3 – South European
ISO-8859-4 – North European
ISO-8859-5 – Slavic languages
ISO-8859-9 – Turkish
ISO-8859-15 – West European II
ISO-8859-16 – South-Eastern European
KOI8-R – Russian
UTF-8 – Unicode
In addition, external charset table files can be stored in LIBS:Charsets or loaded by an application from PROGDIR:Charsets. The charset files included with this distributions are:
IBM866 – Cyrillic (cp866)
ISO-8859-7 – Greek (LatinGreek)
ISO-8859-10 – Nordic (Latin 6)
windows-1250 – Central/East Europe (Windows)
windows-1251 – Cyrillic (Windows)
windows-1252 – West European (Windows)
Windows-1252 was first character set in Windows. It was a copy of ASCII, but used 8-bits to represent 256 different characters (international letters). Windows-1252 is supported by all browsers.
Source Code
/***************************************************************************
codesets.library - Amiga shared library for handling different codesets
Copyright (C) 2001-2005 by Alfonso [alfie] Ranieri <alforan@tin.it>.
Copyright (C) 2005-2014 codesets.library Open Source Team
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
codesets.library project: http://sourceforge.net/projects/codesetslib/
Most of the code included in this file was relicensed from GPL to LGPL
from the source code of SimpleMail (http://www.sf.net/projects/simplemail)
with full permissions by its authors.
$Id$
***************************************************************************/
#include <exec/libraries.h>
#include <libraries/codesets.h>
#include <proto/codesets.h>
#include <proto/exec.h>
#include <stdio.h>
/* This is just a very quickly written test, not a full-featured convertor */
#define BUF_SIZE 102400
struct Library *CodesetsBase = NULL;
#if defined(__amigaos4__)
struct CodesetsIFace *ICodesets = NULL;
#endif
#if defined(__amigaos4__)
#define GETINTERFACE(iface, base) (iface = (APTR)GetInterface((struct Library *)(base), "main", 1L, NULL))
#define DROPINTERFACE(iface) (DropInterface((struct Interface *)iface), iface = NULL)
#else
#define GETINTERFACE(iface, base) TRUE
#define DROPINTERFACE(iface)
#endif
struct codeset *srcCodeset;
struct codeset *destCodeset;
int main(int argc, char **argv)
{
char *buf, *destbuf;
ULONG destlen;
FILE *f;
if (argc < 4)
{
fprintf(stderr, "Usage: %s <source codeset> <destination codeset> <source file>\n", argv[0]);
return 0;
}
if((CodesetsBase = OpenLibrary(CODESETSNAME,CODESETSVER)) &&
GETINTERFACE(ICodesets, CodesetsBase))
{
srcCodeset = CodesetsFind(argv[1], CSA_FallbackToDefault, FALSE, TAG_DONE);
if (srcCodeset)
{
destCodeset = CodesetsFind(argv[2], CSA_FallbackToDefault, FALSE, TAG_DONE);
if (destCodeset)
{
buf = AllocMem(BUF_SIZE, MEMF_CLEAR);
if (buf)
{
f = fopen(argv[3], "r");
if (f)
{
fread(buf, BUF_SIZE-1, 1, f);
fclose(f);
destbuf = CodesetsConvertStr(CSA_SourceCodeset, (IPTR)srcCodeset,
CSA_DestCodeset, (IPTR)destCodeset,
CSA_Source, (IPTR)buf,
CSA_DestLenPtr, (IPTR)&destlen,
TAG_DONE);
if (destbuf)
{
fprintf(stderr, "Result length: %u\n", (unsigned int)destlen);
fwrite(destbuf, destlen, 1, stdout);
fputc('\n', stderr);
CodesetsFreeA(destbuf, NULL);
}
else
fprintf(stderr, "Failed to convert text!\n");
}
FreeMem(buf, BUF_SIZE);
}
else
fprintf(stderr, "Failed to allocate %d bytes for buffer\n", BUF_SIZE);
}
else
fprintf(stderr, "Unknown destination codeset %s\n", argv[2]);
}
else
fprintf(stderr, "Unknown source codeset %s\n", argv[1]);
DROPINTERFACE(ICodesets);
CloseLibrary(CodesetsBase);
}
else
fprintf(stderr, "Failed to open codesets.library!\n");
return 0;
}
/***************************************************************************
codesets.library - Amiga shared library for handling different codesets
Copyright (C) 2001-2005 by Alfonso [alfie] Ranieri <alforan@tin.it>.
Copyright (C) 2005-2014 codesets.library Open Source Team
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
codesets.library project: http://sourceforge.net/projects/codesetslib/
Most of the code included in this file was relicensed from GPL to LGPL
from the source code of SimpleMail (http://www.sf.net/projects/simplemail)
with full permissions by its authors.
$Id$
***************************************************************************/
#include <proto/exec.h>
#include <proto/codesets.h>
#include <stdio.h>
#include <string.h>
#define ISO8859_1_STR "Schmöre bröd, schmöre bröd, bröd bröd bräd."
#define CP1251_STR "1251 êîäèðîâêà äëÿ ïðèìåðà."
#define ASCII_STR "latin 1 bla bla bla."
#define KOI8R_STR "koi îÅ×ÏÚÍÏÖÎÏ ÐÅÒÅËÏÄÉÒÏ×ÁÔØ ÉÚ ËÏÄÉÒÏ×ËÉ"
struct Library *CodesetsBase = NULL;
#if defined(__amigaos4__)
struct CodesetsIFace* ICodesets = NULL;
#endif
#if defined(__amigaos4__)
#define GETINTERFACE(iface, base) (iface = (APTR)GetInterface((struct Library *)(base), "main", 1L, NULL))
#define DROPINTERFACE(iface) (DropInterface((struct Interface *)iface), iface = NULL)
#else
#define GETINTERFACE(iface, base) TRUE
#define DROPINTERFACE(iface)
#endif
int main(void)
{
int res;
if((CodesetsBase = OpenLibrary(CODESETSNAME,CODESETSVER)) &&
GETINTERFACE(ICodesets, CodesetsBase))
{
IPTR errNum = 0;
struct codeset *cs;
if((cs = CodesetsFindBest(CSA_Source, (IPTR)ISO8859_1_STR,
CSA_ErrPtr, (IPTR)&errNum,
TAG_DONE)))
{
printf("Identified ISO8859_1_STR as %s with %d of %d errors\n", cs->name, (int)errNum, (int)strlen(ISO8859_1_STR));
}
else
printf("couldn't identify ISO8859_1_STR!\n");
if((cs = CodesetsFindBest(CSA_Source, (IPTR)CP1251_STR,
CSA_ErrPtr, (IPTR)&errNum,
CSA_CodesetFamily, CSV_CodesetFamily_Cyrillic,
TAG_DONE)))
{
printf("Identified CP1251_STR as %s with %d of %d errors\n", cs->name, (int)errNum, (int)strlen(CP1251_STR));
}
else
printf("couldn't identify CP1251_STR!\n");
if((cs = CodesetsFindBest(CSA_Source, (IPTR)ASCII_STR,
CSA_ErrPtr, (IPTR)&errNum,
CSA_CodesetFamily, CSV_CodesetFamily_Cyrillic,
TAG_DONE)))
{
printf("Identified ASCII_STR as %s with %d of %d errors\n", cs->name, (int)errNum, (int)strlen(ASCII_STR));
}
else
printf("couldn't identify ASCII_STR!\n");
if((cs = CodesetsFindBest(CSA_Source, (IPTR)KOI8R_STR,
CSA_ErrPtr, (IPTR)&errNum,
CSA_CodesetFamily, CSV_CodesetFamily_Cyrillic,
TAG_DONE)))
{
printf("Identified KOI8R_STR as %s with %d of %d errors\n", cs->name, (int)errNum, (int)strlen(KOI8R_STR));
}
else
printf("couldn't identify KOI8R_STR!\n");
res = 0;
DROPINTERFACE(ICodesets);
CloseLibrary(CodesetsBase);
CodesetsBase = NULL;
}
else
{
printf("can't open %s %d+\n",CODESETSNAME,CODESETSVER);
res = 20;
}
return res;
}
From SimpleMail
/***************************************************************************
SimpleMail - Copyright (C) 2000 Hynek Schlawack and Sebastian Bauer
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
***************************************************************************/
/**
* @brief Support of codesets.
*
* @file codesets.c
*/
#include "codesets.h"
#include <ctype.h>
#include <dirent.h> /* dir stuff */
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "codesets_table.h"
#include "debug.h"
#include "punycode.h"
#include "smintl.h"
#include "support_indep.h"
/* from ConvertUTF.h */
/*
* Copyright 2001 Unicode, Inc.
*
* Disclaimer
*
* This source code is provided as is by Unicode, Inc. No claims are
* made as to fitness for any particular purpose. No warranties of any
* kind are expressed or implied. The recipient agrees to determine
* applicability of information provided. If this file has been
* purchased on magnetic or optical media from Unicode, Inc., the
* sole remedy for any claim will be exchange of defective media
* within 90 days of receipt.
*
* Limitations on Rights to Redistribute This Code
*
* Unicode, Inc. hereby grants the right to freely use the information
* supplied in this file in the creation of products supporting the
* Unicode Standard, and to make copies of this file in any form
* for internal or external distribution as long as this notice
* remains attached.
*/
/* ---------------------------------------------------------------------
Conversions between UTF32, UTF-16, and UTF-8. Header file.
Several funtions are included here, forming a complete set of
conversions between the three formats. UTF-7 is not included
here, but is handled in a separate source file.
Each of these routines takes pointers to input buffers and output
buffers. The input buffers are const.
Each routine converts the text between *sourceStart and sourceEnd,
putting the result into the buffer between *targetStart and
targetEnd. Note: the end pointers are *after* the last item: e.g.
*(sourceEnd - 1) is the last item.
The return result indicates whether the conversion was successful,
and if not, whether the problem was in the source or target buffers.
(Only the first encountered problem is indicated.)
After the conversion, *sourceStart and *targetStart are both
updated to point to the end of last text successfully converted in
the respective buffers.
Input parameters:
sourceStart - pointer to a pointer to the source buffer.
The contents of this are modified on return so that
it points at the next thing to be converted.
targetStart - similarly, pointer to pointer to the target buffer.
sourceEnd, targetEnd - respectively pointers to the ends of the
two buffers, for overflow checking only.
These conversion functions take a ConversionFlags argument. When this
flag is set to strict, both irregular sequences and isolated surrogates
will cause an error. When the flag is set to lenient, both irregular
sequences and isolated surrogates are converted.
Whether the flag is strict or lenient, all illegal sequences will cause
an error return. This includes sequences such as: <F4 90 80 80>, <C0 80>,
or <A0> in UTF-8, and values above 0x10FFFF in UTF-32. Conformant code
must check for illegal sequences.
When the flag is set to lenient, characters over 0x10FFFF are converted
to the replacement character; otherwise (when the flag is set to strict)
they constitute an error.
Output parameters:
The value "sourceIllegal" is returned from some routines if the input
sequence is malformed. When "sourceIllegal" is returned, the source
value will point to the illegal value that caused the problem. E.g.,
in UTF-8 when a sequence is malformed, it points to the start of the
malformed sequence.
Author: Mark E. Davis, 1994.
Rev History: Rick McGowan, fixes & updates May 2001.
------------------------------------------------------------------------ */
/* ---------------------------------------------------------------------
The following 4 definitions are compiler-specific.
The C standard does not guarantee that wchar_t has at least
16 bits, so wchar_t is no less portable than unsigned short!
All should be unsigned values to avoid sign extension during
bit mask & shift operations.
------------------------------------------------------------------------ */
typedef unsigned long UTF32; /* at least 32 bits */
typedef unsigned short UTF16; /* at least 16 bits */
typedef unsigned char UTF8; /* typically 8 bits */
typedef unsigned char Boolean; /* 0 or 1 */
/* Some fundamental constants */
#define UNI_REPLACEMENT_CHAR (UTF32)0x0000FFFD
#define UNI_MAX_BMP (UTF32)0x0000FFFF
#define UNI_MAX_UTF16 (UTF32)0x0010FFFF
#define UNI_MAX_UTF32 (UTF32)0x7FFFFFFF
typedef enum {
conversionOK, /* conversion successful */
sourceExhausted, /* partial character in source, but hit end */
targetExhausted, /* insuff. room in target for conversion */
sourceIllegal, /* source sequence is illegal/malformed */
sourceCorrupt, /* source contains invalid UTF-7 */ /* addded */
} ConversionResult;
typedef enum {
strictConversion = 0,
lenientConversion
} ConversionFlags;
ConversionResult ConvertUTF32toUTF16 (
UTF32** sourceStart, const UTF32* sourceEnd,
UTF16** targetStart, const UTF16* targetEnd, const ConversionFlags flags);
ConversionResult ConvertUTF16toUTF32 (
UTF16** sourceStart, UTF16* sourceEnd,
UTF32** targetStart, const UTF32* targetEnd, const ConversionFlags flags);
ConversionResult ConvertUTF16toUTF8 (
UTF16** sourceStart, const UTF16* sourceEnd,
UTF8** targetStart, const UTF8* targetEnd, ConversionFlags flags);
ConversionResult ConvertUTF8toUTF16 (
UTF8** sourceStart, UTF8* sourceEnd,
UTF16** targetStart, const UTF16* targetEnd, const ConversionFlags flags);
ConversionResult ConvertUTF32toUTF8 (
UTF32** sourceStart, const UTF32* sourceEnd,
UTF8** targetStart, const UTF8* targetEnd, ConversionFlags flags);
ConversionResult ConvertUTF8toUTF32 (
UTF8** sourceStart, UTF8* sourceEnd,
UTF32** targetStart, const UTF32* targetEnd, ConversionFlags flags);
static Boolean isLegalUTF8Sequence(const UTF8 *source, const UTF8 *sourceEnd);
/* --------------------------------------------------------------------- */
int utf8islegal(const char *source, const char *sourceend)
{
return isLegalUTF8Sequence((const UTF8*)source, (const UTF8*)sourceend);
}
/* --------------------------------------------------------------------- */
/* ConvertUTF.c */
/*
* Copyright 2001 Unicode, Inc.
*
* Disclaimer
*
* This source code is provided as is by Unicode, Inc. No claims are
* made as to fitness for any particular purpose. No warranties of any
* kind are expressed or implied. The recipient agrees to determine
* applicability of information provided. If this file has been
* purchased on magnetic or optical media from Unicode, Inc., the
* sole remedy for any claim will be exchange of defective media
* within 90 days of receipt.
*
* Limitations on Rights to Redistribute This Code
*
* Unicode, Inc. hereby grants the right to freely use the information
* supplied in this file in the creation of products supporting the
* Unicode Standard, and to make copies of this file in any form
* for internal or external distribution as long as this notice
* remains attached.
*/
/* ---------------------------------------------------------------------
Conversions between UTF32, UTF-16, and UTF-8. Source code file.
Author: Mark E. Davis, 1994.
Rev History: Rick McGowan, fixes & updates May 2001.
See the header file "ConvertUTF.h" for complete documentation.
------------------------------------------------------------------------ */
/*#include "ConvertUTF.h"*/
/*#ifdef CVTUTF_DEBUG*/
#include <stdio.h>
/*#endif*/
static const int halfShift = 10; /* used for shifting by 10 bits */
static const UTF32 halfBase = 0x0010000UL;
static const UTF32 halfMask = 0x3FFUL;
#define UNI_SUR_HIGH_START (UTF32)0xD800
#define UNI_SUR_HIGH_END (UTF32)0xDBFF
#define UNI_SUR_LOW_START (UTF32)0xDC00
#define UNI_SUR_LOW_END (UTF32)0xDFFF
#define false 0
#define true 1
/* --------------------------------------------------------------------- */
ConversionResult ConvertUTF32toUTF16 (
UTF32** sourceStart, const UTF32* sourceEnd,
UTF16** targetStart, const UTF16* targetEnd, const ConversionFlags flags) {
ConversionResult result = conversionOK;
UTF32* source = *sourceStart;
UTF16* target = *targetStart;
while (source < sourceEnd) {
UTF32 ch;
if (target >= targetEnd) {
result = targetExhausted; break;
}
ch = *source++;
if (ch <= UNI_MAX_BMP) { /* Target is a character <= 0xFFFF */
if ((flags == strictConversion) && (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_LOW_END)) {
--source; /* return to the illegal value itself */
result = sourceIllegal;
break;
} else {
*target++ = ch; /* normal case */
}
} else if (ch > UNI_MAX_UTF16) {
if (flags == strictConversion) {
result = sourceIllegal;
} else {
*target++ = UNI_REPLACEMENT_CHAR;
}
} else {
/* target is a character in range 0xFFFF - 0x10FFFF. */
if (target + 1 >= targetEnd) {
result = targetExhausted; break;
}
ch -= halfBase;
*target++ = (ch >> halfShift) + UNI_SUR_HIGH_START;
*target++ = (ch & halfMask) + UNI_SUR_LOW_START;
}
}
*sourceStart = source;
*targetStart = target;
return result;
}
/* --------------------------------------------------------------------- */
ConversionResult ConvertUTF16toUTF32 (
UTF16** sourceStart, UTF16* sourceEnd,
UTF32** targetStart, const UTF32* targetEnd, const ConversionFlags flags) {
ConversionResult result = conversionOK;
UTF16* source = *sourceStart;
UTF32* target = *targetStart;
UTF32 ch, ch2;
while (source < sourceEnd) {
ch = *source++;
if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_HIGH_END && source < sourceEnd) {
ch2 = *source;
if (ch2 >= UNI_SUR_LOW_START && ch2 <= UNI_SUR_LOW_END) {
ch = ((ch - UNI_SUR_HIGH_START) << halfShift)
+ (ch2 - UNI_SUR_LOW_START) + halfBase;
++source;
} else if (flags == strictConversion) { /* it's an unpaired high surrogate */
--source; /* return to the illegal value itself */
result = sourceIllegal;
break;
}
} else if ((flags == strictConversion) && (ch >= UNI_SUR_LOW_START && ch <= UNI_SUR_LOW_END)) {
/* an unpaired low surrogate */
--source; /* return to the illegal value itself */
result = sourceIllegal;
break;
}
if (target >= targetEnd) {
result = targetExhausted; break;
}
*target++ = ch;
}
*sourceStart = source;
*targetStart = target;
#ifdef CVTUTF_DEBUG
if (result == sourceIllegal) {
fprintf(stderr, "ConvertUTF16toUTF32 illegal seq 0x%04x,%04x\n", ch, ch2);
fflush(stderr);
}
#endif
return result;
}
/* --------------------------------------------------------------------- */
/*
* Index into the table below with the first byte of a UTF-8 sequence to
* get the number of trailing bytes that are supposed to follow it.
*/
static const char trailingBytesForUTF8[256] = {
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5
};
/*
* Magic values subtracted from a buffer value during UTF8 conversion.
* This table contains as many values as there might be trailing bytes
* in a UTF-8 sequence.
*/
static const UTF32 offsetsFromUTF8[6] = { 0x00000000UL, 0x00003080UL, 0x000E2080UL,
0x03C82080UL, 0xFA082080UL, 0x82082080UL };
/*
* Once the bits are split out into bytes of UTF-8, this is a mask OR-ed
* into the first byte, depending on how many bytes follow. There are
* as many entries in this table as there are UTF-8 sequence types.
* (I.e., one byte sequence, two byte... six byte sequence.)
*/
static const UTF8 firstByteMark[7] = { 0x00, 0x00, 0xC0, 0xE0, 0xF0, 0xF8, 0xFC };
/* --------------------------------------------------------------------- */
/* The interface converts a whole buffer to avoid function-call overhead.
* Constants have been gathered. Loops & conditionals have been removed as
* much as possible for efficiency, in favor of drop-through switches.
* (See "Note A" at the bottom of the file for equivalent code.)
* If your compiler supports it, the "isLegalUTF8" call can be turned
* into an inline function.
*/
/* --------------------------------------------------------------------- */
ConversionResult ConvertUTF16toUTF8 (
UTF16** sourceStart, const UTF16* sourceEnd,
UTF8** targetStart, const UTF8* targetEnd, ConversionFlags flags) {
ConversionResult result = conversionOK;
UTF16* source = *sourceStart;
UTF8* target = *targetStart;
while (source < sourceEnd) {
UTF32 ch;
unsigned short bytesToWrite = 0;
const UTF32 byteMask = 0xBF;
const UTF32 byteMark = 0x80;
ch = *source++;
/* If we have a surrogate pair, convert to UTF32 first. */
if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_HIGH_END && source < sourceEnd) {
UTF32 ch2 = *source;
if (ch2 >= UNI_SUR_LOW_START && ch2 <= UNI_SUR_LOW_END) {
ch = ((ch - UNI_SUR_HIGH_START) << halfShift)
+ (ch2 - UNI_SUR_LOW_START) + halfBase;
++source;
} else if (flags == strictConversion) { /* it's an unpaired high surrogate */
--source; /* return to the illegal value itself */
result = sourceIllegal;
break;
}
} else if ((flags == strictConversion) && (ch >= UNI_SUR_LOW_START && ch <= UNI_SUR_LOW_END)) {
--source; /* return to the illegal value itself */
result = sourceIllegal;
break;
}
/* Figure out how many bytes the result will require */
if (ch < (UTF32)0x80) { bytesToWrite = 1;
} else if (ch < (UTF32)0x800) { bytesToWrite = 2;
} else if (ch < (UTF32)0x10000) { bytesToWrite = 3;
} else if (ch < (UTF32)0x200000) { bytesToWrite = 4;
} else { bytesToWrite = 2;
ch = UNI_REPLACEMENT_CHAR;
}
target += bytesToWrite;
if (target > targetEnd) {
target -= bytesToWrite; result = targetExhausted; break;
}
switch (bytesToWrite) { /* note: everything falls through. */
case 4: *--target = (ch | byteMark) & byteMask; ch >>= 6;
case 3: *--target = (ch | byteMark) & byteMask; ch >>= 6;
case 2: *--target = (ch | byteMark) & byteMask; ch >>= 6;
case 1: *--target = ch | firstByteMark[bytesToWrite];
}
target += bytesToWrite;
}
*sourceStart = source;
*targetStart = target;
return result;
}
/* --------------------------------------------------------------------- */
/*
* Utility routine to tell whether a sequence of bytes is legal UTF-8.
* This must be called with the length pre-determined by the first byte.
* If not calling this from ConvertUTF8to*, then the length can be set by:
* length = trailingBytesForUTF8[*source]+1;
* and the sequence is illegal right away if there aren't that many bytes
* available.
* If presented with a length > 4, this returns false. The Unicode
* definition of UTF-8 goes up to 4-byte sequences.
*/
static Boolean isLegalUTF8(const UTF8 *source, int length) {
UTF8 a;
const UTF8 *srcptr = source+length;
switch (length) {
default: return false;
/* Everything else falls through when "true"... */
case 4: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false;
case 3: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false;
case 2: if ((a = (*--srcptr)) > 0xBF) return false;
switch (*source) {
/* no fall-through in this inner switch */
case 0xE0: if (a < 0xA0) return false; break;
case 0xF0: if (a < 0x90) return false; break;
case 0xF4: if (a > 0x8F) return false; break;
default: if (a < 0x80) return false;
}
case 1: if (*source >= 0x80 && *source < 0xC2) return false;
if (*source > 0xF4) return false;
}
return true;
}
/* --------------------------------------------------------------------- */
/*
* Exported function to return whether a UTF-8 sequence is legal or not.
* This is not used here; it's just exported.
*/
Boolean isLegalUTF8Sequence(const UTF8 *source, const UTF8 *sourceEnd) {
int length = trailingBytesForUTF8[*source]+1;
if (source+length > sourceEnd) {
return false;
}
return isLegalUTF8(source, length);
}
/* --------------------------------------------------------------------- */
ConversionResult ConvertUTF8toUTF16 (
UTF8** sourceStart, UTF8* sourceEnd,
UTF16** targetStart, const UTF16* targetEnd, const ConversionFlags flags) {
ConversionResult result = conversionOK;
UTF8* source = *sourceStart;
UTF16* target = *targetStart;
while (source < sourceEnd) {
UTF32 ch = 0;
unsigned short extraBytesToRead = trailingBytesForUTF8[*source];
if (source + extraBytesToRead >= sourceEnd) {
result = sourceExhausted; break;
}
/* Do this check whether lenient or strict */
if (! isLegalUTF8(source, extraBytesToRead+1)) {
result = sourceIllegal;
break;
}
/*
* The cases all fall through. See "Note A" below.
*/
switch (extraBytesToRead) {
case 3: ch += *source++; ch <<= 6;
case 2: ch += *source++; ch <<= 6;
case 1: ch += *source++; ch <<= 6;
case 0: ch += *source++;
}
ch -= offsetsFromUTF8[extraBytesToRead];
if (target >= targetEnd) {
result = targetExhausted; break;
}
if (ch <= UNI_MAX_BMP) { /* Target is a character <= 0xFFFF */
if ((flags == strictConversion) && (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_LOW_END)) {
--source; /* return to the illegal value itself */
result = sourceIllegal;
break;
} else {
*target++ = ch; /* normal case */
}
} else if (ch > UNI_MAX_UTF16) {
if (flags == strictConversion) {
result = sourceIllegal;
source -= extraBytesToRead; /* return to the start */
} else {
*target++ = UNI_REPLACEMENT_CHAR;
}
} else {
/* target is a character in range 0xFFFF - 0x10FFFF. */
if (target + 1 >= targetEnd) {
result = targetExhausted; break;
}
ch -= halfBase;
*target++ = (ch >> halfShift) + UNI_SUR_HIGH_START;
*target++ = (ch & halfMask) + UNI_SUR_LOW_START;
}
}
*sourceStart = source;
*targetStart = target;
return result;
}
/* --------------------------------------------------------------------- */
ConversionResult ConvertUTF32toUTF8 (
UTF32** sourceStart, const UTF32* sourceEnd,
UTF8** targetStart, const UTF8* targetEnd, ConversionFlags flags) {
ConversionResult result = conversionOK;
UTF32* source = *sourceStart;
UTF8* target = *targetStart;
while (source < sourceEnd) {
UTF32 ch;
unsigned short bytesToWrite = 0;
const UTF32 byteMask = 0xBF;
const UTF32 byteMark = 0x80;
ch = *source++;
/* surrogates of any stripe are not legal UTF32 characters */
if (flags == strictConversion ) {
if ((ch >= UNI_SUR_HIGH_START) && (ch <= UNI_SUR_LOW_END)) {
--source; /* return to the illegal value itself */
result = sourceIllegal;
break;
}
}
/* Figure out how many bytes the result will require */
if (ch < (UTF32)0x80) { bytesToWrite = 1;
} else if (ch < (UTF32)0x800) { bytesToWrite = 2;
} else if (ch < (UTF32)0x10000) { bytesToWrite = 3;
} else if (ch < (UTF32)0x200000) { bytesToWrite = 4;
} else { bytesToWrite = 2;
ch = UNI_REPLACEMENT_CHAR;
}
target += bytesToWrite;
if (target > targetEnd) {
target -= bytesToWrite; result = targetExhausted; break;
}
switch (bytesToWrite) { /* note: everything falls through. */
case 4: *--target = (ch | byteMark) & byteMask; ch >>= 6;
case 3: *--target = (ch | byteMark) & byteMask; ch >>= 6;
case 2: *--target = (ch | byteMark) & byteMask; ch >>= 6;
case 1: *--target = ch | firstByteMark[bytesToWrite];
}
target += bytesToWrite;
}
*sourceStart = source;
*targetStart = target;
return result;
}
/* --------------------------------------------------------------------- */
ConversionResult ConvertUTF8toUTF32 (
UTF8** sourceStart, UTF8* sourceEnd,
UTF32** targetStart, const UTF32* targetEnd, ConversionFlags flags) {
ConversionResult result = conversionOK;
UTF8* source = *sourceStart;
UTF32* target = *targetStart;
while (source < sourceEnd) {
UTF32 ch = 0;
unsigned short extraBytesToRead = trailingBytesForUTF8[*source];
if (source + extraBytesToRead >= sourceEnd) {
result = sourceExhausted; break;
}
/* Do this check whether lenient or strict */
if (! isLegalUTF8(source, extraBytesToRead+1)) {
result = sourceIllegal;
break;
}
/*
* The cases all fall through. See "Note A" below.
*/
switch (extraBytesToRead) {
case 3: ch += *source++; ch <<= 6;
case 2: ch += *source++; ch <<= 6;
case 1: ch += *source++; ch <<= 6;
case 0: ch += *source++;
}
ch -= offsetsFromUTF8[extraBytesToRead];
if (target >= targetEnd) {
result = targetExhausted; break;
}
if (ch <= UNI_MAX_UTF32) {
*target++ = ch;
} else if (ch > UNI_MAX_UTF32) {
*target++ = UNI_REPLACEMENT_CHAR;
} else {
if (target + 1 >= targetEnd) {
result = targetExhausted; break;
}
ch -= halfBase;
*target++ = (ch >> halfShift) + UNI_SUR_HIGH_START;
*target++ = (ch & halfMask) + UNI_SUR_LOW_START;
}
}
*sourceStart = source;
*targetStart = target;
return result;
}
/* ---------------------------------------------------------------------
Note A.
The fall-through switches in UTF-8 reading code save a
temp variable, some decrements & conditionals. The switches
are equivalent to the following loop:
{
int tmpBytesToRead = extraBytesToRead+1;
do {
ch += *source++;
--tmpBytesToRead;
if (tmpBytesToRead) ch <<= 6;
} while (tmpBytesToRead > 0);
}
In UTF-8 writing code, the switches on "bytesToWrite" are
similarly unrolled loops.
--------------------------------------------------------------------- */
/* Some code has been taken from the ConvertUTF7.c file (the utf7 stuff below),
this is the copyright notice */
/* ================================================================ */
/*
File: ConvertUTF7.c
Author: David B. Goldsmith
Copyright (C) 1994, 1996 IBM Corporation All rights reserved.
Revisions: Header update only July, 2001.
This code is copyrighted. Under the copyright laws, this code may not
be copied, in whole or part, without prior written consent of IBM Corporation.
IBM Corporation grants the right to use this code as long as this ENTIRE
copyright notice is reproduced in the code. The code is provided
AS-IS, AND IBM CORPORATION DISCLAIMS ALL WARRANTIES, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT
WILL IBM CORPORATION BE LIABLE FOR ANY DAMAGES WHATSOEVER (INCLUDING,
WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS
INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR OTHER PECUNIARY
LOSS) ARISING OUT OF THE USE OR INABILITY TO USE THIS CODE, EVEN
IF IBM CORPORATION HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
BECAUSE SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF
LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES, THE ABOVE
LIMITATION MAY NOT APPLY TO YOU.
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the
government is subject to restrictions as set forth in subparagraph
(c)(l)(ii) of the Rights in Technical Data and Computer Software
clause at DFARS 252.227-7013 and FAR 52.227-19.
This code may be protected by one or more U.S. and International
Patents.
*/
/* ------------------------------------- */
struct list codesets_list;
/**************************************************************************
Returns the supported codesets as an null terminated string array
**************************************************************************/
char **codesets_supported(void)
{
static char **array;
if (array) return array;
if ((array = (char**)malloc(sizeof(char*)*(list_length(&codesets_list)+1))))
{
struct codeset *code;
int i;
SM_DEBUGF(15,("%ld supported Codesets:\n",list_length(&codesets_list)));
code = (struct codeset*)list_first(&codesets_list);
i = 0;
while (code)
{
SM_DEBUGF(15,(" %p next=%p prev=%p list=%p name=%p %s alt=%p char=%p\n",code,code->node.next,code->node.prev,code->node.list,code->name,code->name,code->alt_name,code->characterization));
array[i++] = code->name;
code = (struct codeset*)node_next(&code->node);
}
array[i] = NULL;
}
return array;
}
/**************************************************************************
The compare function
**************************************************************************/
static int codesets_cmp_unicode(const void *arg1, const void *arg2)
{
char *a1 = (char*)((struct single_convert*)arg1)->utf8 + 1;
char *a2 = (char*)((struct single_convert*)arg2)->utf8 + 1;
return (int)strcmp(a1,a2);
}
/**
* Reads the codeset table from the given filename and adds it.
*
* @param name
* @return
*/
static int codesets_read_table(char *name)
{
char buf[512];
FILE *fh = fopen(name,"r");
if (fh)
{
struct codeset *codeset;
if ((codeset = (struct codeset*)malloc(sizeof(struct codeset))))
{
int i;
memset(codeset,0,sizeof(struct codeset));
for (i=0;i<256;i++)
codeset->table[i].code = codeset->table[i].ucs4 = i;
while (myreadline(fh,buf))
{
char *result;
if ((result = get_key_value(buf,"Standard"))) codeset->name = mystrdup(result);
else if ((result = get_key_value(buf,"AltStandard"))) codeset->alt_name = mystrdup(result);
else if ((result = get_key_value(buf,"ReadOnly"))) codeset->read_only = !!atoi(result);
else if ((result = get_key_value(buf,"Characterization")))
{
if ((result[0] == '_') && (result[1] == '(') && (result[2] == '"'))
{
char *end = strchr(result+3,'"');
if (end)
{
char *txt = mystrndup(result+3,end-(result+3));
if (txt) codeset->characterization = mystrdup(_(txt));
free(txt);
}
}
} else
{
char *p = buf;
int fmt2 = 0;
if ((*p == '=') || (fmt2 = ((*p == '0') || (*(p+1)=='x'))))
{
p++;
p += fmt2;
i = strtol(p,&p,16);
if (i > 0 && i < 256)
{
while (isspace((unsigned char)*p)) p++;
if (!mystrnicmp(p,"U+",2))
{
p += 2;
codeset->table[i].ucs4 = strtol(p,&p,16);
} else
{
if (*p!='#') codeset->table[i].ucs4 = strtol(p,&p,0);
}
}
}
}
}
for (i=0;i<256;i++)
{
UTF32 src = codeset->table[i].ucs4;
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
}
fclose(fh);
}
return 1;
}
/*****************************************************************************/
int codesets_init(void)
{
int i;
struct codeset *codeset;
UTF32 src;
SM_ENTER;
list_init(&codesets_list);
if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 0;
memset(codeset,0,sizeof(*codeset));
codeset->name = mystrdup("ISO-8859-1 + Euro");
codeset->alt_name = NULL;
codeset->characterization = mystrdup(_("West European (with EURO)"));
codeset->read_only = 1;
for (i=0;i<256;i++)
{
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
if (i==164) src = 0x20AC; /* the EURO sign */
else src = i;
codeset->table[i].code = i;
codeset->table[i].ucs4 = src;
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1;
memset(codeset,0,sizeof(*codeset));
codeset->name = mystrdup("ISO-8859-1");
codeset->alt_name = NULL;
codeset->characterization = mystrdup(_("West European"));
codeset->read_only = 0;
for (i=0;i<256;i++)
{
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
src = i;
codeset->table[i].code = i;
codeset->table[i].ucs4 = src;
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
memset(codeset,0,sizeof(*codeset));
codeset->name = mystrdup("ISO-8859-2");
codeset->alt_name = NULL;
codeset->characterization = mystrdup(_("Central/East European"));
codeset->read_only = 0;
for (i=0;i<256;i++)
{
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
if (i < 0xa0) src = i;
else src = iso_8859_2_to_ucs4[i-0xa0];
codeset->table[i].code = i;
codeset->table[i].ucs4 = src;
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
memset(codeset,0,sizeof(*codeset));
codeset->name = mystrdup("ISO-8859-3");
codeset->alt_name = NULL;
codeset->characterization = mystrdup(_("South European"));
codeset->read_only = 0;
for (i=0;i<256;i++)
{
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
if (i < 0xa0) src = i;
else src = iso_8859_3_to_ucs4[i-0xa0];
codeset->table[i].code = i;
codeset->table[i].ucs4 = src;
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
memset(codeset,0,sizeof(*codeset));
codeset->name = mystrdup("ISO-8859-4");
codeset->alt_name = NULL;
codeset->characterization = mystrdup(_("North European"));
codeset->read_only = 0;
for (i=0;i<256;i++)
{
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
if (i < 0xa0) src = i;
else src = iso_8859_4_to_ucs4[i-0xa0];
codeset->table[i].code = i;
codeset->table[i].ucs4 = src;
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
memset(codeset,0,sizeof(*codeset));
codeset->name = mystrdup("KOI8-R");
codeset->alt_name = NULL;
codeset->characterization = mystrdup(_("Russian"));
codeset->read_only = 0;
for (i=0;i<256;i++)
{
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
if (i < 0x80) src = i;
else src = koi8r_to_ucs4[i-0x80];
codeset->table[i].code = i;
codeset->table[i].ucs4 = src;
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
memset(codeset,0,sizeof(*codeset));
codeset->name = mystrdup("ISO-8859-5");
codeset->alt_name = NULL;
codeset->characterization = mystrdup(_("Slavic languages"));
codeset->read_only = 0;
for (i=0;i<256;i++)
{
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
if (i < 0xa0) src = i;
else src = iso_8859_5_to_ucs4[i-0xa0];
codeset->table[i].code = i;
codeset->table[i].ucs4 = src;
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
memset(codeset,0,sizeof(*codeset));
codeset->name = mystrdup("ISO-8859-9");
codeset->alt_name = NULL;
codeset->characterization = mystrdup(_("Turkish"));
codeset->read_only = 0;
for (i=0;i<256;i++)
{
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
if (i < 0xa0) src = i;
else src = iso_8859_9_to_ucs4[i-0xa0];
codeset->table[i].code = i;
codeset->table[i].ucs4 = src;
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
memset(codeset,0,sizeof(*codeset));
codeset->name = mystrdup("ISO-8859-15");
codeset->alt_name = NULL;
codeset->characterization = mystrdup(_("West European II"));
codeset->read_only = 0;
for (i=0;i<256;i++)
{
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
if (i < 0xa0) src = i;
else src = iso_8859_15_to_ucs4[i-0xa0];
codeset->table[i].code = i;
codeset->table[i].ucs4 = src;
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
memset(codeset,0,sizeof(*codeset));
codeset->name = mystrdup("ISO-8859-16");
codeset->alt_name = NULL;
codeset->characterization = mystrdup(_("South-Eastern European"));
codeset->read_only = 0;
for (i=0;i<256;i++)
{
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
if (i < 0xa0) src = i;
else src = iso_8859_16_to_ucs4[i-0xa0];
codeset->table[i].code = i;
codeset->table[i].ucs4 = src;
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
memset(codeset,0,sizeof(*codeset));
codeset->name = mystrdup("AmigaPL");
codeset->alt_name = NULL;
codeset->characterization = mystrdup("AmigaPL");
codeset->read_only = 1;
for (i=0;i<256;i++)
{
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
if (i < 0xa0) src = i;
else src = amigapl_to_ucs4[i-0xa0];
codeset->table[i].code = i;
codeset->table[i].ucs4 = src;
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
memset(codeset,0,sizeof(*codeset));
codeset->name = mystrdup("Amiga-1251");
codeset->alt_name = NULL;
codeset->characterization = mystrdup("Amiga-1251");
codeset->read_only = 1;
for (i=0;i<256;i++)
{
UTF32 *src_ptr = &src;
UTF8 *dest_ptr = &codeset->table[i].utf8[1];
if (i < 0xa0) src = i;
else src = amiga1251_to_ucs4[i-0xa0];
codeset->table[i].code = i;
codeset->table[i].ucs4 = src;
ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
*dest_ptr = 0;
codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
}
memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
list_insert_tail(&codesets_list,&codeset->node);
SM_DEBUGF(15,("%ld internal charsets\n",list_length(&codesets_list)));
{
/* dynamicaly loaded */
DIR *dfd; /* directory descriptor */
struct dirent *dptr; /* dir entry */
char path[380];
getcwd(path, sizeof(path));
if (chdir(SM_CHARSET_DIR) != -1)
{
if ((dfd = opendir(SM_CURRENT_DIR)))
{
while ((dptr = readdir(dfd)) != NULL)
{
if (!strcmp(".",dptr->d_name) || !strcmp("..",dptr->d_name)) continue;
SM_DEBUGF(15,("Loading \"%s\" charset\n",dptr->d_name,list_length(&codesets_list)));
codesets_read_table(dptr->d_name);
}
closedir(dfd);
}
chdir(path);
}
}
SM_RETURN(1,"%ld");
}
/*****************************************************************************/
void codesets_cleanup(void)
{
struct codeset *codeset;
while ((codeset = (struct codeset*)list_remove_tail(&codesets_list)))
{
free(codeset->name);
free(codeset->alt_name);
free(codeset->characterization);
free(codeset);
}
}
/*****************************************************************************/
struct codeset *codesets_find(const char *name)
{
struct codeset *codeset = (struct codeset*)list_first(&codesets_list);
/* Return ISO-8859-1 as default codeset */
if (!name) return codeset;
while (codeset)
{
if (!mystricmp(name,codeset->name) || !mystricmp(name,codeset->alt_name)) return codeset;
codeset = (struct codeset*)node_next(&codeset->node);
}
return NULL;
}
/*****************************************************************************/
int codesets_unconvertable_chars(struct codeset *codeset, const char *text, int text_len)
{
struct single_convert conv;
const char *text_ptr = text;
int i;
int errors = 0;
for (i=0;i < text_len;i++)
{
unsigned char c = *text_ptr++;
if (c)
{
int len = trailingBytesForUTF8[c];
conv.utf8[1] = c;
strncpy((char*)&conv.utf8[2],text_ptr,len);
conv.utf8[2+len] = 0;
text_ptr += len;
if (!bsearch(&conv,codeset->table_sorted,256,sizeof(codeset->table_sorted[0]),codesets_cmp_unicode))
errors++;
} else break;
}
return errors;
}
/*****************************************************************************/
struct codeset *codesets_find_best(const char *text, int text_len, int *error_ptr)
{
struct codeset *codeset = (struct codeset*)list_first(&codesets_list);
struct codeset *best_codeset = NULL;
int best_errors = text_len;
while (codeset)
{
if (!codeset->read_only)
{
int errors = codesets_unconvertable_chars(codeset, text, text_len);
if (errors < best_errors)
{
best_codeset = codeset;
best_errors = errors;
}
if (!best_errors) break;
}
codeset = (struct codeset*)node_next(&codeset->node);
}
if (!best_codeset) best_codeset = (struct codeset*)list_first(&codesets_list);
if (error_ptr) *error_ptr = best_errors;
return best_codeset;
}
/*****************************************************************************/
int utf8len(const utf8 *str)
{
int len ;
unsigned char c;
if (!str) return 0;
len = 0;
while ((c = *str++))
{
len++;
str += trailingBytesForUTF8[c];
}
return len;
}
/*****************************************************************************/
utf8 *utf8dup(const utf8 *str)
{
return (utf8*)mystrdup((char*)str);
}
/*****************************************************************************/
int utf8realpos(const utf8 *str, int pos)
{
const utf8 *str_save = str;
unsigned char c;
if (!str) return 0;
while (pos && (c = *str))
{
pos--;
str += trailingBytesForUTF8[c] + 1;
}
return str - str_save;
}
/*****************************************************************************/
int utf8charpos(const utf8 *str, int pos)
{
int cp = 0;
unsigned char c;
while (pos > 0 && (c = *str))
{
str += trailingBytesForUTF8[c] + 1;
pos -= trailingBytesForUTF8[c] + 1;
cp++;
}
return cp;
}
/*****************************************************************************/
int utf8bytes(const utf8 *str)
{
unsigned char c = *str;
return trailingBytesForUTF8[c] + 1;
}
/*****************************************************************************/
utf8 *utf8ncpy(utf8 *to, const utf8 *from, int n)
{
utf8 *saved_to = to;
for (;n;n--)
{
unsigned char c = *from++;
int len = trailingBytesForUTF8[c];
*to++ = c;
for (;len;len--)
{
*to++ = *from++;
}
}
return saved_to;
}
/*****************************************************************************/
utf8 *utf8create(const void *from, const char *charset)
{
/* utf8create_len() will stop on a null byte */
return utf8create_len(from,charset,0x7fffffff);
}
/*****************************************************************************/
int utf8fromstr(const char *from, struct codeset *codeset, utf8 *dest, unsigned int dest_size)
{
const char *src = from;
unsigned char c;
int conv = 0;
if (dest_size < 1)
return 0;
if (!codeset)
codeset = (struct codeset*)list_first(&codesets_list);
for (src = from;(c = (unsigned char)*src);src++)
{
unsigned char *utf8_seq;
unsigned int l;
utf8_seq = &codeset->table[c].utf8[0];
/* Recall that the first element represents
* the number of characters */
l = utf8_seq[0];
if (dest_size <= l)
break;
utf8_seq++;
for(;(c = *utf8_seq);utf8_seq++)
*dest++ = c;
dest_size -= l;
conv++;
}
*dest = 0;
return conv;
}
/*****************************************************************************/
utf8 *utf8create_len(const void *from, const char *charset, int from_len)
{
int dest_size = 0;
char *dest;
char *src = (char*)from;
unsigned char c;
int len;
struct codeset *codeset = codesets_find(charset);
if (!from) return NULL;
if (!codeset)
{
if (!mystricmp(charset,"utf-7"))
{
return (utf8*)utf7ntoutf8((char *)from,from_len);
}
if (!mystricmp(charset,"utf-8"))
{
return (utf8*)mystrdup((char *)from);
}
codeset = (struct codeset*)list_first(&codesets_list);
}
len = from_len;
while (((c = *src++) && (len--)))
dest_size += codeset->table[c].utf8[0];
if ((dest = (char*)malloc(dest_size+1)))
{
char *dest_ptr = dest;
for (src = (char*)from;from_len && (c = *src);src++,from_len--)
{
unsigned char *utf8_seq;
for(utf8_seq = &codeset->table[c].utf8[1];(c = *utf8_seq);utf8_seq++)
*dest_ptr++ = c;
}
*dest_ptr = 0;
return (utf8*)dest;
}
return NULL;
}
/*****************************************************************************/
int utf8tostr(const utf8 *str, char *dest, unsigned int dest_size, struct codeset *codeset)
{
unsigned int i;
struct single_convert *f;
char *dest_iter = dest;
if (!dest_size)
{
return 0;
}
if (!codeset) codeset = (struct codeset*)list_first(&codesets_list);
if (!codeset || !str)
{
*dest = 0;
return 0;
}
for (i=0;i < dest_size-1;i++)
{
unsigned char c = *str;
if (c)
{
if (c > 127)
{
unsigned int len_add = trailingBytesForUTF8[c];
unsigned int len_str = len_add + 1;
BIN_SEARCH(codeset->table_sorted,0,255,mystrncmp((unsigned char*)str,codeset->table_sorted[m].utf8+1,len_str),f);
if (f) *dest_iter++ = f->code;
else *dest_iter++ = '_';
str += len_add;
} else *dest_iter++ = c;
str++;
} else break;
}
*dest_iter = 0;
return i;
}
/*****************************************************************************/
char *utf8tostrcreate(const utf8 *str, struct codeset *codeset)
{
char *dest;
int len;
if (!str) return NULL;
len = strlen((char*)str);
if ((dest = (char*)malloc(len+1)))
utf8tostr(str,dest,len+1,codeset);
return dest;
}
/*****************************************************************************/
int utf8tochar(const utf8 *str, unsigned int *chr, struct codeset *codeset)
{
struct single_convert conv;
struct single_convert *f;
unsigned char c;
int len = 0;
if (!codeset) codeset = (struct codeset*)list_first(&codesets_list);
if (!codeset) return 0;
if ((c = *str++))
{
int i;
len = trailingBytesForUTF8[c];
conv.utf8[1] = c;
for (i=0;i<len;i++)
{
if (!(conv.utf8[i+2] = *str++))
{
/* We encountered a 0 byte although the trailing byte suggested
* a different length. Hence the given utf8 sequence is not
* considered as valid */
*chr = 0;
return i+1;
}
}
conv.utf8[2+len] = 0;
if ((f = (struct single_convert*)bsearch(&conv,codeset->table_sorted,256,sizeof(codeset->table_sorted[0]),codesets_cmp_unicode)))
{
*chr = f->code;
} else *chr = 0;
} else *chr = 0;
return len+1;
}
/*****************************************************************************/
static inline int utf8cmp_single(unsigned char *a, unsigned char *b)
{
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
int d;
if ((d = a[0] - b[0])) return d;
if ((d = a[1] - b[1])) return d;
if ((d = a[2] - b[2])) return d;
if ((d = a[3] - b[3])) return d;
return 0;
#else
return (*((unsigned int *)a) - *((unsigned int *)b));
#endif
}
/*****************************************************************************/
int utf8tolower(const char *str, char *dest)
{
unsigned char ch[4] = {0,0,0,0};
unsigned char c;
struct uniconv *uc;
int bytes;
int i;
c = *str++;
if (c<0x80)
{
*dest = tolower(c);
return 1;
}
bytes = trailingBytesForUTF8[c];
if (bytes > 3)
{
*dest++ = c;
memcpy(dest + 1,str + 1,bytes);
return bytes + 1;
}
ch[3-bytes] = c;
for (i=bytes-1;i>=0;i--)
{
if (!(ch[3-i] = *str++))
return 0;
}
BIN_SEARCH(utf8_tolower_table,0,ARRAY_LEN(utf8_tolower_table),utf8cmp_single(utf8_tolower_table[m].from, ch),uc);
if (uc)
memcpy(dest, uc->to + 3 - bytes, bytes + 1);
else
memcpy(dest, ch + 3 - bytes, bytes + 1);
return bytes + 1;
}
/*****************************************************************************/
int utf8stricmp(const char *str1, const char *str2)
{
unsigned char c1;
unsigned char c2;
if (!str1)
{
if (!str2) return 0;
return -1;
}
if (!str2) return 1;
while (1)
{
int d;
char bytes1,bytes2;
c1 = *str1++;
c2 = *str2++;
if (!c1)
{
if (!c2) return 0;
return -1;
}
if (!c2) return 1;
if (c1 < 0x80)
{
if (c2 < 0x80)
{
d = tolower(c1) - tolower(c2);
if (d) return d;
continue;
} else
{
/* TODO: must use locale sensitive sorting */
return -1;
}
}
if (c2 < 0x80) return 1; /* TODO: must use locale sensitive sorting */
bytes1 = trailingBytesForUTF8[c1];
bytes2 = trailingBytesForUTF8[c2];
/* case mapping only happens within same number of bytes (currently) */
if ((d = bytes1 - bytes2)) return d;
if (bytes1 > 3)
{
/* case mapping relevant characters are only withing 4 bytes */
while (bytes1)
{
if ((d = *str1++ - *str2++)) return d;
bytes1--;
}
} else
{
unsigned char ch1[4],ch2[4];
struct uniconv *uc1;
struct uniconv *uc2;
int ch1l;
int ch2l;
*((unsigned int *)ch1) = 0;
*((unsigned int *)ch2) = 0;
ch1[3-bytes1] = c1;
ch2[3-bytes1] = c2;
while (bytes1)
{
bytes1--;
ch1[3 - bytes1] = *str1++;
ch2[3 - bytes1] = *str2++;
}
BIN_SEARCH(utf8_tolower_table,0,ARRAY_LEN(utf8_tolower_table),(*((unsigned int *)utf8_tolower_table[m].from) - *((unsigned int *)ch1)),uc1);
BIN_SEARCH(utf8_tolower_table,0,ARRAY_LEN(utf8_tolower_table),(*((unsigned int *)utf8_tolower_table[m].from) - *((unsigned int *)ch2)),uc2);
if (uc1) ch1l = *((unsigned int *)uc1->to);
else ch1l = *((unsigned int *)ch1);
if (uc2) ch2l = *((unsigned int *)uc2->to);
else ch2l = *((unsigned int *)ch2);
if (ch1l != ch2l)
{
if (ch1l < ch2l) return -1;
return 1;
}
}
}
return 0;
}
/*****************************************************************************/
int utf8stricmp_len(const char *str1, const char *str2, int len)
{
unsigned char c1;
unsigned char c2;
if (!str1)
{
if (!str2) return 0;
return -1;
}
if (!str2) return 1;
while (len>0)
{
int d;
char bytes1,bytes2;
c1 = *str1++;
c2 = *str2++;
len--;
if (!c1)
{
if (!c2) return 0;
return -1;
}
if (!c2) return 1;
if (c1 < 0x80)
{
if (c2 < 0x80)
{
d = tolower(c1) - tolower(c2);
if (d) return d;
continue;
} else
{
/* TODO: must use locale sensitive sorting */
return -1;
}
}
if (c2 < 0x80) return 1; /* TODO: must use locale sensitive sorting */
bytes1 = trailingBytesForUTF8[c1];
bytes2 = trailingBytesForUTF8[c2];
/* case mapping only happens within same number of bytes (currently) */
if ((d = bytes1 - bytes2)) return d;
if (bytes1 > 3)
{
/* case mapping relevant characters are only withing 4 bytes */
while (bytes1)
{
if ((d = *str1++ - *str2++)) return d;
bytes1--;
}
} else
{
unsigned char ch1[4],ch2[4];
struct uniconv *uc1;
struct uniconv *uc2;
int ch1l;
int ch2l;
*((unsigned int *)ch1) = 0;
*((unsigned int *)ch2) = 0;
ch1[3-bytes1] = c1;
ch2[3-bytes1] = c2;
while (bytes1)
{
bytes1--;
ch1[3 - bytes1] = *str1++;
ch2[3 - bytes1] = *str2++;
len--;
}
BIN_SEARCH(utf8_tolower_table,0,ARRAY_LEN(utf8_tolower_table),(*((unsigned int *)utf8_tolower_table[m].from) - *((unsigned int *)ch1)),uc1);
BIN_SEARCH(utf8_tolower_table,0,ARRAY_LEN(utf8_tolower_table),(*((unsigned int *)utf8_tolower_table[m].from) - *((unsigned int *)ch2)),uc2);
if (uc1) ch1l = *((unsigned int *)uc1->to);
else ch1l = *((unsigned int *)ch1);
if (uc2) ch2l = *((unsigned int *)uc2->to);
else ch2l = *((unsigned int *)ch2);
if (ch1l != ch2l)
{
if (ch1l < ch2l) return -1;
return 1;
}
}
}
return 0;
}
/*****************************************************************************/
int utf8match(const char *haystack, const char *needle, int case_insensitive, match_mask_t *match_mask)
{
int h, n;
int needle_len;
int haystack_len;
unsigned char hc;
unsigned char nc;
haystack_len = strlen(haystack);
needle_len = strlen(needle);
h = 0;
n = 0;
while (h < haystack_len && n < needle_len)
{
int match;
int hbytes;
int nbytes;
match = 0;
hc = haystack[h];
nc = needle[n];
hbytes = trailingBytesForUTF8[hc];
nbytes = trailingBytesForUTF8[nc];
if (hbytes == nbytes)
{
if (hc == nc)
{
int i;
match = 1;
for (i=0; i < hbytes; i++)
{
if (haystack[i+1] != needle[i+1])
match = 0;
}
} else
{
if (hbytes == 0 && case_insensitive)
{
if (tolower(hc) == tolower(nc))
{
match = 1;
}
}
}
if (!match && case_insensitive && hbytes > 0)
{
char hchars[6] = {0};
char nchars[6] = {0};
int hl, nl;
if ((hl = utf8tolower(&haystack[h], hchars)) > 0 &&
(nl = utf8tolower(&needle[n], nchars)) > 0)
{
if (hl == nl)
{
match = memcmp(hchars, nchars, nl) == 0;
}
}
}
}
if (match)
{
n += nbytes + 1;
}
if (match_mask)
{
unsigned int match_pos;
match_pos = match_bitmask_pos(h);
if (match)
{
match_mask[match_pos] |= match_bitmask(h);
} else
{
match_mask[match_pos] &= ~match_bitmask(h);
}
}
h += hbytes + 1;
}
if (n == needle_len)
{
if (match_mask)
{
/* Make sure that the remaining relevant positions are cleared */
for (;h < haystack_len; h++)
{
match_mask[match_bitmask_pos(h)] &= ~match_bitmask(h);
}
}
return 1;
}
return 0;
}
/*****************************************************************************/
char *utf8stristr(const char *str1, const char *str2)
{
int str2_len;
if (!str1 || !str2) return NULL;
str2_len = strlen(str2);
while (*str1)
{
if (!utf8stricmp_len(str1,str2,str2_len))
return (char*)str1;
str1++;
}
return NULL;
}
/*****************************************************************************/
const char *uft8toucs(const char *chr, unsigned int *code)
{
unsigned char c = *chr++;
unsigned int ucs = 0;
int i,bytes;
if (!(c & 0x80))
{
*code = c;
return chr;
} else
{
if (!(c & 0x20))
{
bytes = 2;
ucs = c & 0x1f;
}
else if (!(c & 0x10))
{
bytes = 3;
ucs = c & 0xf;
}
else if (!(c & 0x08))
{
bytes = 4;
ucs = c & 0x7;
}
else if (!(c & 0x04))
{
bytes = 5;
ucs = c & 0x3;
}
else /* if (!(c & 0x02)) */
{
bytes = 6;
ucs = c & 0x1;
}
for (i=1;i<bytes;i++)
ucs = (ucs << 6) | ((*chr++)&0x3f);
}
*code = ucs;
return chr;
}
static unsigned char base64[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
static short invbase64[128];
static unsigned char ibase64[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,";
static short iinvbase64[128];
static unsigned char direct[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'(),-./:?";
static unsigned char optional[] = "!\"#$%&*;<=>@[]^_`{|}";
static unsigned char spaces[] = " \011\015\012"; /* space, tab, return, line feed */
static char mustshiftsafe[128];
static char mustshiftopt[128];
static int needtables = 1;
static void tabinit(void)
{
int i, limit;
for (i = 0; i < 128; ++i)
{
mustshiftopt[i] = mustshiftsafe[i] = 1;
invbase64[i] = -1;
}
limit = strlen((char*)direct);
for (i = 0; i < limit; ++i)
mustshiftopt[direct[i]] = mustshiftsafe[direct[i]] = 0;
limit = strlen((char*)spaces);
for (i = 0; i < limit; ++i)
mustshiftopt[spaces[i]] = mustshiftsafe[spaces[i]] = 0;
limit = strlen((char*)optional);
for (i = 0; i < limit; ++i)
mustshiftopt[optional[i]] = 0;
limit = strlen((char*)base64);
for (i = 0; i < limit; ++i)
invbase64[base64[i]] = i;
/* that's for the modified imap utf7 stuff */
limit = strlen((char*)ibase64);
for (i = 0; i < limit; ++i)
iinvbase64[ibase64[i]] = i;
needtables = 0;
}
#if __cplusplus >= 201703L
#define DECLARE_BIT_BUFFER unsigned long BITbuffer = 0, buffertemp = 0; int bufferbits = 0
#else
#define DECLARE_BIT_BUFFER register unsigned long BITbuffer = 0, buffertemp = 0; int bufferbits = 0
#endif
#define BITS_IN_BUFFER bufferbits
#define WRITE_N_BITS(x, n) ((BITbuffer |= ( ((x) & ~(-1L<<(n))) << (32-(n)-bufferbits) ) ), bufferbits += (n) )
#define READ_N_BITS(n) ((buffertemp = (BITbuffer >> (32-(n)))), (BITbuffer <<= (n)), (bufferbits -= (n)), buffertemp)
/*****************************************************************************/
char *utf7ntoutf8(char *source, int sourcelen)
{
FILE *fh;
int base64value=0,base64EOF=0,first=0;
int shifted = 0;
char *dest = NULL;
DECLARE_BIT_BUFFER;
if (needtables) tabinit();
if ((fh = tmpfile()))
{
int dest_len;
while (sourcelen)
{
unsigned char c = *source++;
sourcelen--;
if (shifted)
{
if ((base64EOF = (!sourcelen) || (c > 0x7f) || (base64value = invbase64[c]) < 0))
{
shifted = 0;
/* If the character causing us to drop out was SHIFT_IN or
SHIFT_OUT, it may be a special escape for SHIFT_IN. The
test for SHIFT_IN is not necessary, but allows an alternate
form of UTF-7 where SHIFT_IN is escaped by SHIFT_IN. This
only works for some values of SHIFT_IN.
*/
if (c && sourcelen && (c == '+' || c == '-'))
{
/* get another character c */
unsigned char prevc = c;
c = *source++;
/* If no base64 characters were encountered, and the
character terminating the shift sequence was
SHIFT_OUT, then it's a special escape for SHIFT_IN.
*/
if (first && prevc == '-')
{
fputc('+',fh);
}
}
} else
{
/* Add another 6 bits of base64 to the bit buffer. */
WRITE_N_BITS(base64value, 6);
first = 0;
}
}
/* Extract as many full 16 bit characters as possible from the
bit buffer.
*/
while (BITS_IN_BUFFER >= 16)
{
UTF32 src_utf32 = READ_N_BITS(16);
UTF32 *src_utf32_ptr = &src_utf32;
UTF8 target_utf8[10];
UTF8 *target_utf8_ptr = target_utf8;
ConvertUTF32toUTF8(&src_utf32_ptr,src_utf32_ptr+1,&target_utf8_ptr,target_utf8+10, strictConversion);
fwrite(target_utf8,1,target_utf8_ptr - target_utf8,fh);
}
if (!c) break;
if (base64EOF) BITS_IN_BUFFER = 0;
if (!shifted)
{
if (c == '+')
{
shifted = first = 1;
} else
{
if (c <= 0x7f)
{
fputc(c,fh);
} /* else the source is invalid, so we ignore this */
}
}
}
if ((dest_len = ftell(fh)))
{
fseek(fh,0,SEEK_SET);
if ((dest = (char*)malloc(dest_len+1)))
{
fread(dest,1,dest_len,fh);
dest[dest_len]=0;
}
}
}
return dest;
}
/*****************************************************************************/
char *utf8toiutf7(char *utf8, int sourcelen)
{
FILE *fh;
char *dest = NULL;
if (needtables) tabinit();
if ((fh = tmpfile()))
{
int dest_len;
int shifted = 0;
DECLARE_BIT_BUFFER;
while (1)
{
unsigned char c;
int noshift;
if (sourcelen)
{
c = *utf8;
noshift = (c >= 0x20 && c <= 0x7e) && (c != '&');
} else
{
c = 0;
noshift = 1;
}
if (shifted)
{
while (BITS_IN_BUFFER >= 6)
{
unsigned char bits = READ_N_BITS(6);
fputc(ibase64[bits],fh);
}
if (noshift)
{
int bits_in_buf = BITS_IN_BUFFER;
if (bits_in_buf)
{
unsigned char bits = READ_N_BITS(bits_in_buf);
bits <<= 6 - bits_in_buf;
fputc(ibase64[bits],fh);
}
shifted = 0;
fputc('-',fh);
}
}
if (!c) break;
if (noshift)
{
if (c == '&')
{
fputs("&-",fh);
} else fputc(c,fh);
utf8++;
sourcelen--;
} else
{
UTF8 *source = (UTF8*)utf8;
UTF16 dest = 0;
UTF16 *dest_ptr = &dest;
ConversionResult res;
res = ConvertUTF8toUTF16(&source, source + sourcelen, &dest_ptr, dest_ptr + 1, strictConversion);
if (res == conversionOK || res == targetExhausted)
{
sourcelen -= trailingBytesForUTF8[c] + 1;
utf8 += trailingBytesForUTF8[c] + 1;
if (!shifted)
{
fputc('&',fh);
shifted = 1;
}
WRITE_N_BITS(dest,16);
}
}
}
if ((dest_len = ftell(fh)))
{
fseek(fh,0,SEEK_SET);
if ((dest = (char*)malloc(dest_len+1)))
{
fread(dest,1,dest_len,fh);
dest[dest_len]=0;
}
}
}
return dest;
}
/*****************************************************************************/
char *iutf7ntoutf8(char *source, int sourcelen)
{
FILE *fh;
int base64value=0,base64EOF=0,first=0;
int shifted = 0;
char *dest = NULL;
DECLARE_BIT_BUFFER;
if (needtables) tabinit();
if ((fh = tmpfile()))
{
int dest_len;
while (sourcelen)
{
unsigned char c = *source++;
sourcelen--;
if (shifted)
{
if ((base64EOF = (!sourcelen) || (c > 0x7f) || (base64value = invbase64[c]) < 0))
{
shifted = 0;
/* If the character causing us to drop out was SHIFT_IN or
SHIFT_OUT, it may be a special escape for SHIFT_IN. The
test for SHIFT_IN is not necessary, but allows an alternate
form of UTF-7 where SHIFT_IN is escaped by SHIFT_IN. This
only works for some values of SHIFT_IN.
*/
if (c && sourcelen && (c == '-'))
{
/* get another character c */
unsigned char prevc = c;
c = *source++;
/* If no base64 characters were encountered, and the
character terminating the shift sequence was
SHIFT_OUT, then it's a special escape for SHIFT_IN.
*/
if (first && prevc == '-')
{
fputc('&',fh);
}
}
} else
{
/* Add another 6 bits of base64 to the bit buffer. */
WRITE_N_BITS(base64value, 6);
first = 0;
}
}
/* Extract as many full 16 bit characters as possible from the
bit buffer.
*/
while (BITS_IN_BUFFER >= 16)
{
UTF32 src_utf32 = READ_N_BITS(16);
UTF32 *src_utf32_ptr = &src_utf32;
UTF8 target_utf8[10];
UTF8 *target_utf8_ptr = target_utf8;
ConvertUTF32toUTF8(&src_utf32_ptr,src_utf32_ptr+1,&target_utf8_ptr,target_utf8+10, strictConversion);
fwrite(target_utf8,1,target_utf8_ptr - target_utf8,fh);
}
if (!c) break;
if (base64EOF) BITS_IN_BUFFER = 0;
if (!shifted)
{
if (c == '&')
{
shifted = first = 1;
} else
{
if (c <= 0x7f)
{
fputc(c,fh);
} /* else the source is invalid, so we ignore this */
}
}
}
if ((dest_len = ftell(fh)))
{
fseek(fh,0,SEEK_SET);
if ((dest = (char*)malloc(dest_len+1)))
{
fread(dest,1,dest_len,fh);
dest[dest_len]=0;
}
}
}
return dest;
}
/*****************************************************************************/
char *utf8topunycode(const utf8 *source, int sourcelen)
{
enum punycode_status status;
const utf8 *sourceend;
char *puny;
punycode_uint puny_len;
punycode_uint *dest, *target;
punycode_uint dest_len;
if (!(dest = (punycode_uint *)malloc(sourcelen * sizeof(punycode_uint))))
return NULL;
target = dest;
sourceend = source + sourcelen;
while (source < sourceend)
{
punycode_uint ch = 0;
unsigned short extraBytesToRead = trailingBytesForUTF8[*(UTF8*)source];
if (source + extraBytesToRead >= sourceend)
{
/* source exhausted */
free(dest);
return NULL;
}
/* Do this check whether lenient or strict */
if (!isLegalUTF8((UTF8*)source, extraBytesToRead+1))
{
free(dest);
return NULL;
}
/*
* The cases all fall through.
*/
switch (extraBytesToRead) {
case 3: ch += *source++; ch <<= 6;
case 2: ch += *source++; ch <<= 6;
case 1: ch += *source++; ch <<= 6;
case 0: ch += *source++;
}
ch -= offsetsFromUTF8[extraBytesToRead];
if (ch <= UNI_MAX_UTF32) {
*target++ = ch;
} else if (ch > UNI_MAX_UTF32) {
*target++ = UNI_REPLACEMENT_CHAR;
}
}
dest_len = target - dest; /* No 0 ending */
puny_len = dest_len * 2;
do
{
int strored_puny_len = puny_len;
if (!(puny = (char*)malloc(puny_len+5)))
{
free(dest);
return NULL;
}
status = punycode_encode(dest_len, dest, NULL /* case flags */, &puny_len, puny);
if (status == punycode_success)
{
puny[puny_len] = 0;
free(dest);
return puny;
}
puny_len = strored_puny_len * 2;
} while (status == punycode_big_output);
free(puny);
free(dest);
return NULL;
}
/*****************************************************************************/
utf8 *punycodetoutf8(const char *source, int sourcelen)
{
enum punycode_status status;
punycode_uint *utf32;
punycode_uint length;
length = sourcelen;
if (!(utf32 = (punycode_uint*)malloc(sizeof(punycode_uint)*sourcelen)))
return NULL;
status = punycode_decode(sourcelen, source, &length, utf32, NULL);
if (status == punycode_success)
{
utf8 *dest = (utf8*)malloc(sourcelen * 4);
if (dest)
{
UTF8 *dest_start = (UTF8*)dest;
UTF32 *source_start = (UTF32*)utf32;
ConvertUTF32toUTF8((UTF32**)&source_start, (UTF32*)(utf32) + length, &dest_start, dest_start + sourcelen * 4 - 2, strictConversion);
*dest_start = 0;
free(utf32);
return dest;
}
}
free(utf32);
return NULL;
}
/*****************************************************************************/
int isascii7(const char *str)
{
char c;
if (!str) return 1;
while ((c = *str++))
{
if (c & 0x80) return 0;
}
return 1;
}
From Wookiechat
Charsets: wookiechat doesnt need the incoming charset to be configured exactly anymore. When someone types weird characters, wookie will scan it for utf8 characters.. if it has those, then it'll convert it to ascii as best an Amiga can using codesets.library. if theres none, then it'll just use codesets.library Codesets_FindBest() function.
Library Calls
TABLE OF CONTENTS
codesets.library/codesets.library
codesets.library/CodesetsSupportedA
codesets.library/CodesetsFindA
codesets.library/CodesetsFindBestA
codesets.library/CodesetsConvertStrA
codesets.library/CodesetsFreeA
codesets.library/CodesetsFreeVecPooledA
codesets.library/CodesetsSetDefaultA
codesets.library/CodesetsListCreateA
codesets.library/CodesetsListDeleteA
codesets.library/CodesetsListAddA
codesets.library/CodesetsListRemoveA
codesets.library/CodesetsUTF8CreateA
codesets.library/CodesetsUTF8ToStrA
codesets.library/CodesetsUTF8Len
codesets.library/CodesetsIsValidUTF8
codesets.library/CodesetsIsLegalUTF8
codesets.library/CodesetsIsLegalUTF8Sequence
codesets.library/CodesetsStrLenA
codesets.library/CodesetsConvertUTF16toUTF32
codesets.library/CodesetsConvertUTF16toUTF8
codesets.library/CodesetsConvertUTF32toUTF16
codesets.library/CodesetsConvertUTF32toUTF8
codesets.library/CodesetsConvertUTF8toUTF16
codesets.library/CodesetsConvertUTF8toUTF32
codesets.library/CodesetsDecodeB64A
codesets.library/CodesetsEncodeB64A
codesets.library/codesets.library
*******************************************************************
Copyright (c) 2005-2008 by codesets.library Open Source Team
$Id$
$URL$
codesets.library is an AmigaOS shared library which provides
functions to deal with different kind of codesets. It provides
general character conversion routines, e.g. for converting
from one charset (e.g. UTF8) into another (e.g. ISO-8859-1) or
vice versa.
codesets.library is mainly based on some code from UNICODE, some
code from the SimpleMail project as well as some additions done
by the codesets.library Open Source Team.
It is released and distributed under the terms of the GNU Lesser
General Public License (LGPL) and available free of charge.
Please visit http://www.sf.net/projects/codesetslib/ for
the very latest version and information regarding codesets.library.
*******************************************************************
For some short introduction on how to use codesets.library, the
following pharagraph should provide a good summary. What you
usually want to do with codesets.library is, to convert strings from
one so-called "Source Codeset" into another "Destination Codeset".
The following list are only the main functions provided to
developers, wanting to achieve this conversion in their applications:
CodesetsSupportedA()
--------------------
For querying codesets library which codesets/charsets it supports
either by its internal available charsets or by having obtained
them from the operating system (e.g. AmigaOS4), this function
can be used.
E.g. in a MUI application you would do something like:
-- cut here --
STRPTR *array;
if((array = CodesetsSupportedA(NULL)))
{
DoMethod(list, MUIM_List_Insert, array, -1, MUIV_List_Insert_Sorted);
CodesetsFreeA(array, NULL);
}
-- cut here --
CodesetsFindA()
---------------
For processing/converting a specific string, you normally have to
specify in which codeset this string has to be intepreted. For this
purpose you have to pass a so-called "Source Codeset" to the main
function of codesets.library. With the "CodesetsFindA()" function you
can query codesets.library for providing you a pointer to the
corresponding codeset structure which you afterwards will forward to
the main conversion routines later on.
For receiving the pointer to the Amiga-1251 codeset:
-- cut here --
struct codeset *cs;
if((cs = CodesetsFind("Amiga-1251",
CSA_FallbackToDefault, FALSE,
TAG_DONE)))
{
...
}
-- cut here --
For querying codesets.library for the currently used system wide
default of your running operating system:
-- cut here --
struct codeset *default;
if((default = CodesetsFindA(NULL, NULL)))
{
...
}
-- cut here --
CodesetsConvertStrA()
---------------------
The more or less most common function to use in codesets.library is
definitly this function. It allows to convert a string from
one "Source Codeset" to another "Destination Codeset". It takes
the source string converts it internally into UTF8 if necessary and
then directly convert the UTF8 to the specified destination codeset.
To convert a string 'str' to a destination codeset:
-- cut here --
STRPTR destString;
if((destString = CodesetsConvertStr(CSA_SourceCodeset, srcCodeset,
CSA_DestCodeset, destCodeset,
CSA_Source, str,
TAG_DONE)))
{
....
CodesetsFreeA(destString, NULL);
}
-- cut here --
Even if the above functions should cover most of the common functionality
an ordinary user of codesets.library would require, it supplies a lot more
functions which in fact we will not go into detail here but present
certain examples in the respective documentation section of each function.
However, if you find the documentation is still too limited or you feel
some major functionality is missing regarding dealing with codesets,
please let us know so that we or even you can improve it.
Your codesets.library Open Source Team.
February 2006
codesets.library/CodesetsSupportedA
NAME
CodesetsSupportedA - returns names of supported codesets
SYNOPSIS
array = CodesetsSupportedA(attrs);
A0
STRPTR * CodesetsSupportedA(struct TagItem *);
array = CodesetsSupported(tag1, ...);
A0
STRPTR * CodesetsSupported(Tag, ...);
FUNCTION
Returns a NULL terminated array of the supported codeset
names. The array _must_ be freed with CodesetsFreeA().
INPUTS
attrs - a list of additional tag items. Valid items are:
CSA_CodesetList (struct codesetList *)
You may supply an unlimited number of additional
codeset lists which you have previously allocated/loaded
with CodesetsListCreateA(). Otherwise just the internal
list of available codesets will be searched.
Default: NONE
CSA_AllowMultibyteCodesets (BOOL)
Include multibyte codesets (UTF8, UTF16, UTF32) in the
generated names array.
Default: TRUE
RESULT
array - the names array or NULL on an error.
EXAMPLE
For printing out all supported codeset names:
-- cut here --
STRPTR *array;
if((array = CodesetsSupportedA(NULL)))
{
int i;
for(i=0; array[i] != NULL; i++)
printf("%s", array[i]);
CodesetsFreeA(array, NULL);
}
-- cut here --
SEE ALSO
codesets.library/CodesetsListCreateA
codesets.library/CodesetsFindA
NAME
CodesetsFindA - finds a codeset
SYNOPSIS
codeset = CodesetsFindA(name, attrs);
D0 A0 A1
struct codeset * CodesetsFindA(STRPTR, struct TagItem *);
codeset = CodesetsFind(name, tag1, ...);
D0 A0 A1
struct codeset * CodesetsFind(STRPTR, Tag, ...);
FUNCTION
Finds and returns a codeset by its name. The data behind the
pointer should be considered read-only and must not be altered
in any way.
INPUTS
name - the codeset name (or alias) to find
attrs - a list of additional tag items. Valid items are:
CSA_FallbackToDefault (BOOL)
If TRUE the function never fails and returns the default
codeset if the supplied codeset name can't be found.
Default: TRUE
CSA_CodesetList (struct codesetList *)
You may supply an unlimited number of additional
codeset lists which you have previously allocated/loaded
with CodesetsListCreateA(). Otherwise just the internal
list of available codesets will be searched.
Default: NONE
RESULT
codeset - the codeset or NULL on an error
EXAMPLE
E.g. for receiving the pointer to the Amiga-1251 codeset:
-- cut here --
struct codeset *cs;
if((cs = CodesetsFind("Amiga-1251",
CSA_FallbackToDefault, FALSE,
TAG_DONE)))
{
...
}
-- cut here --
For querying codesets.library for the currently used system
wide default of your running operating system:
-- cut here --
struct codeset *default;
if((default = CodesetsFindA(NULL, NULL)))
{
...
}
-- cut here --
NOTE
Please note for querying the system's default codeset the
method of finding this codeset is highly dependent on the way
the operating system can be queried for it. E.g. on AmigaOS4
the default codeset is queried with updated system functions,
but for AmigaOS3 a static list of language<>codeset mappings
is used.
SEE ALSO
codesets.library/CodesetsListCreateA
codesets.library/CodesetsFindBestA
NAME
CodesetsFindBestA - finds the best codeset matching a
string content.
SYNOPSIS
codeset = CodesetsFindBestA(attrs);
D0 A0
struct codeset * CodesetsFindBestA(struct TagItem *);
codeset = CodesetsFindBest(tag1, ...);
D0 A0
struct codeset * CodesetsFindBest(Tag, ...);
FUNCTION
Returns the best found codeset for the given text in the supplied
codeset family. In case no proper codeset for the supplied source string
could be found, NULL is returned or the default codeset if the
CSA_FallbackToDefault attribute is set to TRUE. In addition, in case
the CSA_ErrPtr is given, the amount of failed identifications (chars)
are returned.
INPUTS
attrs - a list of tag items. Valid items are:
CSA_Source (STRPTR)
The string which you want to convert. Must be supplied,
otherwise the functions returns NULL.
CSA_SourceLen (ULONG)
Length of CSA_Source or less to check just a part
Default: string length of CSA_Source
CSA_ErrPtr (int *)
Pointer to an integer variable which will be filled with the
number of found errors (not identifyable chars)
Default: NULL
CSA_CodesetList (struct codesetList *)
You may supply an unlimited number of additional
codeset lists which you have previously allocated/loaded
with CodesetsListCreateA(). Otherwise just the internal
list of available codesets will be searched.
Default: NONE
CSA_CodesetFamily (ULONG)
To narrow the analyze, a user might define the codeset family
of which the supplied text might be composed of. The reason for
this is, that there isn't a unique identification algorithm
which can tell the codeset out of a given text. So to narrow
the identification, the follow values might be specified:
CSV_CodesetFamily_Latin - Latin codeset family (e.g. ISO-8859-X)
CSV_CodesetFamily_Cyrillic - Cyrillic codeset family (e.g. KOI8R)
Default: CSV_CodesetFamily_Latin
CSA_FallbackToDefault (BOOL)
If TRUE the function never fails and returns the default
codeset if the supplied text couldn't be identified
Default: FALSE
RESULT
codeset - the best matching codeset or NULL in case a NULL pointer
was supplied as the source string.
EXAMPLE
E.g. for receiving the pointer to 'best matching' codeset matching
a KOI8-R string:
-- cut here --
struct codeset *cs;
char str[] = "îÅ×ÏÚÍÏÖÎÏ ÐÅÒÅËÏÄÉÒÏ×ÁÔØ ÉÚ ËÏÄÉÒÏ×ËÉ";
int errPtr;
if((cs = CodesetsFindBest(CSA_Source, str,
CSA_ErrPtr, &errPtr,
CSA_CodesetFamily, CSV_CodesetFamily_Cyrillic,
CSA_FallBackToDefault, FALSE,
TAG_DONE)))
{
... should return the KOI8-R codeset ...
}
-- cut here --
SEE ALSO
codesets.library/CodesetsListCreateA
codesets.library/CodesetsConvertStrA
NAME
CodesetsConvertStrA - converts a string from one source codeset to
another destination codeset.
SYNOPSIS
dest = CodesetsConvertStrA(attrs)
D0 A0
STRPTR CodesetsConvertStrA(struct TagItem *);
dest = CodesetsConvertStr(tag1, ...);
D0 A0
STRPTR CodesetsConvertStr(Tag, ...);
FUNCTION
The function takes source string which is encoded in a so-called
'Source codeset' and converts it immediately into an equivalent
string which will be encoded in the corresponding 'Destination Codeset'.
INPUTS
attrs - a list of mandatory tag items. Valid items are:
CSA_Source (STRPTR)
The string which you want to convert. Must be supplied,
otherwise the functions returns NULL.
CSA_SourceLen (ULONG)
Length of CSA_Source or less to convert just a part
Default: string length of CSA_Source
CSA_SourceCodeset (struct codeset *)
The codeset in which the source string is encoded.
Default: the system's default codeset
CSA_DestCodeset (struct codeset *)
The codeset to which the source string should be converted to.
Default: the system's default codeset
CSA_DestLenPtr (ULONG *)
If supplied, will contain the length of the converted string
which is returned.
CSA_MapForeignChars (BOOL)
If a character of the source string cannot be directly mapped
to the destination codeset a "?" character will normally be used
to signal this case. If this attribute is set, an internal
replacement table will be used which tries to replace these
"foreign" characters by "looklike" ASCII character sequences.
Please note, that this functionality is mostly just usable by
Latin users due to the straight mapping to ASCII (7bit).
Default: FALSE
CSA_MapForeignCharsHook (struct Hook *)
If a character of the source string cannot be directly mapped
to the destination codeset a "?" character will normally be used
to signal this case. By using this attribute, a hook can be
supplied which is called for every such foreign character.
Within this hook the UTF8 sequence is supplied which cannot be
directly mapped to the destination codeset. During the execution
of the hook a replacement string might be specified, which in turn
will be used by the internals of codesets.library to map this
"foreign" char to a difference character or UTF8 sequence.
If both, CSA_MapForeignChars and CSA_MapForeignCharsHook, are
specified the hook will only be executed in case the internal
routines don't supply an own mapping for the foreign UTF8 sequence.
The hook function should be declared as:
ULONG ASM SAVEDS fun(REG(a0, struct Hook *hook),
REG(a2, struct replaceMsg *msg),
REG(a1, void *dummy))
struct Hook *hook
Your hook
msg->dst
place your desired replacement string here
msg->src
the UTF8 sequence to be replaced, this string is READ-ONLY!
msg->srclen
the length of the UTF8 sequence to be replaced, do NOT peek
beyond this limit.
The return value of this hook function is the length of the replacement
string. Return zero if no replacement did happen. Positive values will
be treated as lengths of ASCII strings. Negative values signals a
replacement by another UTF8 sequence. Please note, that in case you
supply a UTF8 sequence as a replacement for the "foreign" UTF8, your
hook might be called again if this sequence can still not be mapped to
the destination codesets, thus is again a "foreign" sequence.
RESULT
either a pointer to the generated destination string or NULL
on a found error.
EXAMPLE
To convert an ISO-8859-1 encoded string 'src' into an Amiga-1251
equivalent 'dst' string:
-- cut here --
STRPTR src, dst;
struct codeset *srcCodeset, *dstCodeset;
srcCodeset = CodesetsFindA("ISO-8859-1", NULL);
dstCodeset = CodesetsFindA("Amiga-1251", NULL);
if((dst = CodesetsConvertStr(CSA_SourceCodeset, srcCodeset,
CSA_DestCodeset, dstCodeset,
CSA_Source, src,
TAG_DONE)))
{
....
CodesetsFreeA(dst, NULL);
}
-- cut here --
SEE ALSO
codesets.library/CodesetsFreeA
codesets.library/CodesetsFreeA
NAME
CodesetsFreeA - frees objects previously internally allocated
by codesets.library
SYNOPSIS
CodesetsFreeA(obj, attrs)
A0 A1
void CodesetsFreeA(APTR, struct TagItem *);
CodesetsFree(obj, tag1, ...);
A0 A1
void CodesetsFree(APTR, Tag, ...);
FUNCTION
Frees object previously allocated by codesets.library. E.g. using
functions like CodesetsSupportedA() or CodesetsConvertStrA().
INPUTS
obj - the object to free
attrs - a list of additional tag items. Currently non items.
RESULT
no result
EXAMPLE
-- cut here --
STRPTR *array;
if((array = CodesetsSupportedA(NULL)))
{
...
CodesetsFreeA(array, NULL);
}
-- cut here --
SEE ALSO
codesets.library/CodesetsSupportedA
codesets.library/CodesetsConvertStrA
codesets.library/CodesetsFreeVecPooledA
NAME
CodesetsFreeVecPooledA - frees objects previously allocated
by methods supporting CSA_Pool
SYNOPSIS
CodesetsFreeVecPooledA(pool, obj, attrs)
A0 A1 A2
void CodesetsFreeVecPooledA(APTR, APTR, struct TagItem *);
CodesetsFreeVecPooled(pool, obj, tag1, ...);
A0 A1 A2
void CodesetsFreeVecPooled(APTR, APTR, Tag, ...);
FUNCTION
Frees object previously allocated by codesets.library via a
private memory pool which was previously used on codesets
functions via the CSA_Pool tag.
INPUTS
pool - pointer to the private memory pool
obj - the object to free
attrs - a list of additional tag items. Valid tags are:
CSA_PoolSem (struct SignalSemaphore *)
A semaphore to lock when using CSA_Pool
RESULT
no result
EXAMPLE
-- cut here --
UTF8 *utf8;
STRPTR str;
APTR pool;
if((utf8 = CodesetsUTF8Create(CSA_Source, str,
CSA_Pool, pool,
TAG_DONE)))
{
...
CodesetsFreeVecPooledA(pool,utf8,NULL);
}
-- cut here --
SEE ALSO
codesets.library/CodesetsUTF8CreateA
codesets.library/CodesetsUTF8ToStrA
codesets.library/CodesetsSetDefaultA
NAME
CodesetsSetDefaultA - sets the default codeset, overwriting
the system default if necessary.
SYNOPSIS
codeset = CodesetsSetDefaultA(name, attrs);
A0 A1
struct codeset * CodesetsSetDefaultA(STRPTR, struct TagItem *);
codeset = CodesetsSetDefault(name, tag1, ...);
A0 A1
struct codeset * CodesetsSetDefault(STRPTR, Tag, ...);
FUNCTION
Sets the default codeset to name. The codeset will be stored in
the environment variable 'codeset_default'.
INPUTS
name - the name of the codeset to set as default
attrs - a list of additional tag items. Valid items are:
CSA_Save (BOOL)
If TRUE the codeset will be permanently saved and survives
a reset. Otherwise the default setting will just last until
the next reboot.
Default: FALSE
RESULT
codeset - the codeset or NULL
NOTE
In case the operating system supports the direct query of the
currently active system's default codeset, this function will
still overwrite this setting. So by using this method a user may
overwrite all system's setting and set a global default codeset
for his machine no matter what the OS suggests. However, in case
your operating sytsem perfectly supports the querying of the
system's default codeset (e.g. AmigaOS4) you are adviced to use
this function with care - or even avoid to use it at all.
SEE ALSO
codesets.library/CodesetsFindA
codesets.library/CodesetsListCreateA
NAME
CodesetsListCreateA - creates a private, task-wise codeset list
and returns it to the user for further reference.
SYNOPSIS
list = CodesetsListCreateA(attrs);
D0 A0
struct codesetList * CodesetsListCreateA(struct TagItem *);
list = CodesetsListCreate(tag1, ...);
D0 A0
struct codesetList * CodesetsListCreateA(Tag, ...);
FUNCTION
This function allows to create a private, task-wise codeset list by
loading charset files from either a whole directory tree, a specific
charset file or even by using an exsiting codeset structure.
By using this function, an application might load and carry its very
own private charsets in parallel to the internal charsets of
codeset.library. This way each application can provide a different
codeset list to the user without having to load and manage these
lists on their own.
INPUTS
attrs - a list of addtional tag items. Valid items are:
CSA_CodesetDir (STRPTR)
The path to a whole directory which codesets library will
walk through for searching for proper charset files.
Default: NULL
CSA_CodesetFile (STRPTR)
The path to a specific file which codesets.library will try
to load as a standard charset translation file.
Default: NULL
CSA_SourceCodeset (struct codeset *)
The pointer to an already existing codeset structure which
will immediately be added to the created list. Please be
carefull to add one codeset to multiple lists, especially
when you do a CodesetsListDelete() to free the list.
Default: NULL
RESULT
list - the private codeset list or NULL on an error condition
NOTE
For convienence, if no tag item attribute at all is supplied to the
function, codesets.library will try to load charsets from the
corresponding "PROGDIR:Charsets" directoy and add found codeset to
the list. However, in case a tag item is specified (no matter what
kind) the PROGDIR: scanning will be omitted.
EXAMPLE
For loading all found charset files from PROGDIR:Charsets:
-- cut here --
struct codesetList *csList;
if((csList = CodesetsListCreateA(NULL)))
{
STRPTR codesetArray = CodesetsSupported(CSA_CodesetList, csList,
TAG_DONE);
// codesetsArray should now also carry our private
// codesets from PROGDIR:Charsets
...
CodesetsListDeleteA(CSA_CodesetList, csList,
TAG_DONE);
}
-- cut here --
SEE ALSO
codesets.library/CodesetsListDeleteA
codesets.library/CodesetsListAddA
codesets.library/CodesetsListRemoveA
codesets.library/CodesetsListSupportedA
codesets.library/CodesetsListFindA
codesets.library/CodesetsListFindBestA
codesets.library/CodesetsListDeleteA
NAME
CodesetsListDeleteA - deletes/frees all resources of previously created
private codeset lists.
SYNOPSIS
result = CodesetsListDeleteA(attrs);
D0 A0
BOOL CodesetsListDeleteA(struct TagItem *);
result = CodesetsListDelete(tag1, ...);
D0 A0
BOOL CodesetsListDelete(Tag, ...);
FUNCTION
This function deletes all resources (also the contained codeset
structures per default) and frees the memory of previously allocated
private codeset lists.
INPUTS
attrs - a list of mandatory tag items. Valid items are:
CSA_CodesetList (struct codesetList *)
Pointer to a previously created, private codeset list whos
resources should be freed.
Default: NULL
CSA_FreeCodesets (BOOL)
If TRUE, all contained codesets should also be freed/deleted,
otherwise just frees the list object itself.
Default: TRUE
RESULT
result - TRUE on success otherwise FALSE
NOTE
Please note that if you added an explicit codeset structure to more
than two private codeset lists you may run into problems with you
don't take care of this yourself. This is a dumb function which just
walks through the list and frees all resources. Set CSA_FreeCodesets
to FALSE in case you just want to free the list object.
SEE ALSO
codesets.library/CodesetsListCreateA
codesets.library/CodesetsListAddA
codesets.library/CodesetsListRemoveA
codesets.library/CodesetsListAddA
NAME
CodesetsListAddA - allows to add additional codesets to an already
existing private codeset list previously created with
CodesetsListCreateA().
SYNOPSIS
result = CodesetsListAddA(attrs);
D0 A0
BOOL CodesetsListAddA(struct TagItem *);
result = CodesetsListAdd(tag1, ...);
D0 A0
BOOL CodesetsListAdd(Tag, ...);
FUNCTION
This function allows to add additional codesets to an already existing
private codeset list. Either codesets themself may be added directly, or
the path to either a file or a directory may be specified from which
additional codesets may be loaded from known charset files.
INPUTS
attrs - a list of mandatory tag items. Valid items are:
CSA_CodesetDir (STRPTR)
The path to a whole directory which codesets library will
walk through for searching for proper charset files.
Default: NULL
CSA_CodesetFile (STRPTR)
The path to a specific file which codesets.library will try
to load as a standard charset translation file.
Default: NULL
CSA_SourceCodeset (struct codeset *)
The pointer to an already existing codeset structure which
will immediately be added to the created list. Please be
carefull to add one codeset to multiple lists, especially
when you do a CodesetsListDelete() to free the list.
Default: NULL
RESULT
result - TRUE on success otherwise FALSE
NOTE
Be careful when adding one codeset to more than one codeset list as
you may run into problems when freeing the list afterwards.
SEE ALSO
codesets.library/CodesetsListCreateA
codesets.library/CodesetsListDeleteA
codesets.library/CodesetsListAddA
codesets.library/CodesetsListRemoveA
NAME
CodesetsListRemoveA - removes a single or multiple codesets from a
previously created codeset list.
SYNOPSIS
result = CodesetsListRemoveA(attrs);
D0 A0
BOOL CodesetsListRemoveA(struct TagItem *);
result = CodesetsListRemove(tag1, ...);
D0 A0
BOOL CodesetsListRemove(Tag, ...);
FUNCTION
This function allows to remove single or multiple codesets from a
previously created codeset list. The removed codeset structures will
also be freed/deleted per default.
INPUTS
attrs - a list of mandatory tag items. Valid items are:
CSA_SourceCodeset (struct codeset *)
Pointer to a codeset structure which should be removed from
its corresponding list. Per default its resources will also
be internally freed.
Default: NULL
CSA_FreeCodesets (BOOL)
If TRUE, all supplied codesets should also be freed/deleted,
otherwise the codesets will just be removed from their lists.
Default: TRUE
RESULT
result - TRUE on success otherwise FALSE
NOTE
The function will automatically prevent removal of codesets from the
internal codeset list of codesets.library and will return FALSE in
case a user tried to remove a codeset from the internal list.
SEE ALSO
codesets.library/CodesetsListDeleteA
codesets.library/CodesetsListAddA
codesets.library/CodesetsUTF8CreateA
NAME
CodesetsUTF8CreateA - creates an UTF8 compliant string
interpretation out of a supplied source
string.
SYNOPSIS
utf8 = CodesetsUTF8CreateA(attrs);
A0
UTF8 * CodesetsUTF8CreateA(struct TagItem *);
utf8 = CodesetsUTF8Create(tag1, ...);
A0
UTF8 * CodesetsUTF8Create(Tag, ...);
FUNCTION
Creates an UTF8 from a string which is encoded in specified
codeset.
INPUTS
attrs - a list of mandatory tag items. Valid items are:
CSA_Source (STRPTR)
The string which you want to convert. Must be supplied,
otherwise the functions returns NULL.
CSA_SourceLen (ULONG)
Length of CSA_Source or less to convert just a part
Default: string length of CSA_Source
CSA_SourceCodeset (struct codeset *)
The codeset in which the source string is encoded.
Default: the system's default codeset
CSA_Dest (STRPTR)
Destination buffer. If you supply a valid buffer here, you
must also set CSA_DestLen to the length of your buffer. If
CSA_AllocIfNeeded is TRUE, CSA_DestLen is checked to see if
CSA_Dest may contain the whole utf8. If CSA_Dest can't
contain the utf8, a brand new buffer is allocated. If
CSA_AllocIfNeeded is FALSE, up to CSA_DestLen (ending '\0'
included) are written to CSA_Dest. If CSA_DestHook is supplied,
CSA_Dest is ignored.
Default: NULL.
CSA_DestHook (struct Hook *)
Destination hook. If this is supplied, it is called with a
partial converted string.
The hook function should be declared as:
ULONG ASM SAVEDS fun(REG(a0, struct Hook *hook),
REG(a2, struct convertMsg *msg),
REG(a1, STRPTR buf))
struct Hook *hook
Your hook
STRPTR buf
The partial '\0' terminated buffer
msg->state - one of
o CSV_Translating
More calls to came
o CSV_End
Last call
msg->Len
length of string 'buf'
You may define the min length of the buffer via CSA_DestLen.
If so, accepted values are 16<=v<=sizeof_codeset_buffer.
Don't count on this size to be fixed, even if you used
CSA_DestLen !
CSA_DestLen (ULONG)
If CSA_DestHook is used, it represents the min length of the
buffer that causes hook calls. Otherwise it is the size of
the buffer supplied in CSA_Dest. So if CSA_DestHook is
supplied, CSA_DestLen is optional, otherwise it is required.
CSA_DestLenPtr (ULONG *)
If supplied, will contain the length of the utf8 string
CSA_AllocIfNeeded (BOOL)
If the destination buffer length is too small to contain
the UTF8 a new buffer is allocated
Default: TRUE
CSA_Pool (APTR)
If a new destination buffer needs to be allocated (it happens
if and only if CSA_DestHook is not used, CSA_AllocIfNeeded
is TRUE, or if CSA_Dest buffer is too small for the utf8) this
pool is used. The result must be freed via
CodesetsFreeVecPooledA(pool, utf8, NULL).
If CSA_Pool is not supplied, the destination buffer is allocated
from the internal memory pool and must be freed via
CodesetsFreeA(utf8, NULL).
CSA_PoolSem (struct SignalSemaphore *)
A semaphore to lock when using CSA_Pool
RESULT
utf8 - the utf8 string or NULL
If CSA_DestHook is used always NULL.
If CSA_DestHook is not used NULL means failure
to allocate mem.
EXAMPLE
The shortest invocation is:
-- cut here --
UTF8 *utf8;
STRPTR str;
if((utf8 = CodesetsUTF8Create(CSA_Source, str,
TAG_DONE)))
{
...
CodesetsFreeA(utf8,NULL);
}
-- cut here --
In case you want to use your pool to allocate mem:
-- cut here --
UTF8 *utf8;
STRPTR str;
APTR pool;
if((utf8 = CodesetsUTF8Create(CSA_Source, str,
CSA_Pool, pool,
TAG_DONE)))
{
...
CodesetsFreeVecPooledA(pool,utf8,NULL);
}
-- cut here --
If your pool is to be arbitrated via a semaphore:
-- cut here --
UTF8 *utf8;
STRPTR str;
APTR pool;
struct SignalSemaphore *sem;
if((utf8 = CodesetsUTF8Create(CSA_Source, str,
CSA_Pool, pool,
CSA_PoolSem, sem,
TAG_DONE)))
{
...
CodesetsFreeVecPooledA(pool,utf8,NULL);
}
-- cut here --
If you want to use your own buffer to reduce mem
allocation:
-- cut here --
UTF8 *utf8;
STRPTR buf[256];
if((utf8 = CodesetsUTF8Create(CSA_Source, str,
CSA_Dest, buf,
CSA_DestLen, sizeof(buf),
TAG_DONE)))
{
...
if(utf8 != buf)
CodesetsFreeA(utf8,NULL);
}
-- cut here --
If your string are max MAXLEN chars long (e.g. image to be
in a MUI application and you know the max size of your
string gadgets), you should better supply your own buffer:
-- cut here --
UTF8 *utf8;
STRPTR buf[MAXSIZE*6+1];
if((utf8 = CodesetsUTF8Create(CSA_Source, str,
CSA_Dest, buf,
CSA_Dest, sizeof(buf),
TAG_DONE)))
{
...
}
-- cut here --
If you strings are very large and so you are sure there is
no mem for them and or you have your own reasons to do
that:
-- cut here --
static ULONG ASM SAVEDS
destFun(REG(a0, struct Hook *hook),
REG(a2, struct convertMsg *msg),
REG(a1, STRPTR buf))
{
printf("[%3ld] [%s]\n",msg->len,buf);
if(msg->state == CSV_End)
printf("\n");
return 0;
}
struct Hook dest;
dest.h_Entry = (HOOKFUNC)destFun;
CodesetsUTF8Create(CSA_Source, str,
CSA_DestHook, &dest,
TAG_DONE);
-- cut here --
SEE ALSO
codesets.library/CodesetsUTF8ToStrA
codesets.library/CodesetsUTF8Len
codesets.library/CodesetsUTF8ToStrA
NAME
CodesetsUTF8ToStrA - converts an UTF8 encoded string into
a specified destination codeset.
SYNOPSIS
str = CodesetsUTF8ToStrA(attrs);
D0 A0
STRPTR CodesetsUTF8ToStrA(attrs);
str = CodesetsUTF8ToStr(tag1, ...);
D0 A0
STRPTR CodesetsUTF8ToStr(Tag,...);
FUNCTION
Convert an utf8 string to a specified codeset.
INPUTS
attrs - a list of mandatory tag items. Valid items are:
CSA_Source (STRPTR)
The string which you want to convert. Must be supplied,
otherwise the functions returns NULL.
CSA_SourceLen (ULONG)
Length of CSA_Source. Must be > 0 or the function returns
NULL.
Default: string length of CSA_Source - strlen()
CSA_Dest (STRPTR)
Destination buffer. If you supply a valid buffer here, you
must also set CSA_DestLen to the length of your buffer. If
CSA_AllocIfNeeded is TRUE, CSA_DestLen is checked to see if
CSA_Dest may contain the whole converted string. If CSA_Dest
can't contain the output string, a brand new buffer is allocated.
If CSA_AllocIfNeeded is FALSE, up to CSA_DestLen (ending '\0'
included) are written to CSA_Dest. If CSA_DestHook is supplied,
CSA_Dest is ignored.
Default: NULL.
CSA_DestCodeset (struct codeset *)
The codeset to which the UTF8 string should be encoded to.
Default: the system's default codeset
CSA_DestHook (struct Hook *)
Destination hook. If this is supplied, it is called with a
partial converted string.
The hook function should be declared as:
ULONG ASM SAVEDS fun(REG(a0, struct Hook *hook),
REG(a2, struct convertMsg *msg),
REG(a1, STRPTR buf))
struct Hook *hook
Your hook
STRPTR buf
The partial '\0' terminated buffer
msg->state - one of
o CSV_Translating
More calls to came
o CSV_End
Last call
msg->Len
length of string 'buf'
You may define the min length of the buffer via CSA_DestLen.
If so, accepted values are 16<=v<=sizeof_codeset_buffer.
Don't count on this size to be fixed, even if you used
CSA_DestLen !
CSA_DestLen (ULONG)
If CSA_DestHook is used, it represents the min length of the
buffer that causes hook calls. Otherwise it is the size of
the buffer supplied in CSA_Dest. So if CSA_DestHook is
supplied, CSA_DestLen is optional, otherwise it is required.
CSA_DestLenPtr (ULONG *)
If supplied, will contain the length of the converted string.
CSA_AllocIfNeeded (BOOL)
If the destination buffer length is too small to contain
the output string, a new buffer is allocated.
Default: TRUE
CSA_Pool (APTR)
If a new destination buffer needs to be allocated (it happens
if and only if CSA_DestHook is not used, CSA_AllocIfNeeded
is TRUE, or if CSA_Dest buffer is too small for the utf8) this
pool is used. The result must be freed via
CodesetsFreeVecPooledA(pool, string, NULL).
If CSA_Pool is not supplied, the destination buffer is allocated
from the internal memory pool and must be freed via
CodesetsFreeA(string, NULL).
CSA_PoolSem (struct SignalSemaphore *)
A semaphore to lock when using CSA_Pool
CSA_ErrPtr (int *)
Pointer to an integer variable which will be filled with the
number of found issues (number of not convertable chars)
Default: NULL
CSA_MapForeignChars (BOOL)
If a character of the source string cannot be directly mapped
to the destination codeset a "?" character will normally be used
to signal this case. If this attribute is set, an internal
replacement table will be used which tries to replace these
"foreign" characters by "looklike" ASCII character sequences.
Please note, that this functionality is mostly just usable by
Latin users due to the straight mapping to ASCII (7bit).
Default: FALSE
CSA_MapForeignCharsHook (struct Hook *)
If a character of the source string cannot be directly mapped
to the destination codeset a "?" character will normally be used
to signal this case. By using this attribute, a hook can be
supplied which is called for every such foreign character.
Within this hook the UTF8 sequence is supplied which cannot be
directly mapped to the destination codeset. During the execution
of the hook a replacement string might be specified, which in turn
will be used by the internals of codesets.library to map this
"foreign" char to a difference character or UTF8 sequence.
If both, CSA_MapForeignChars and CSA_MapForeignCharsHook, are
specified the hook will only be executed in case the internal
routines don't supply an own mapping for the foreign UTF8 sequence.
The hook function should be declared as:
ULONG ASM SAVEDS fun(REG(a0, struct Hook *hook),
REG(a2, struct replaceMsg *msg),
REG(a1, void *dummy))
struct Hook *hook
Your hook
msg->dst
place your desired replacement string here
msg->src
the UTF8 sequence to be replaced, this string is READ-ONLY!
msg->srclen
the length of the UTF8 sequence to be replaced, do NOT peek
beyond this limit.
The return value of this hook function is the length of the replacement
string. Return zero if no replacement did happen. Positive values will
be treated as lengths of ASCII strings. Negative values signals a
replacement by another UTF8 sequence. Please note, that in case you
supply a UTF8 sequence as a replacement for the "foreign" UTF8, your
hook might be called again if this sequence can still not be mapped to
the destination codesets, thus is again a "foreign" sequence.
RESULT
str - the string or NULL
If CSA_DestHook is used always NULL.
If CSA_DestHook is not used NULL means failure
to allocate mem.
SEE ALSO
codesets.library/CodesetsUTF8CreateA
codesets.library/CodesetsUTF8Len
codesets.library/CodesetsUTF8Len
NAME
CodesetsUTF8Len - returns the length of a supplied utf8 string.
SYNOPSIS
len = CodesetsUTF8Len(utf8);
D0 A0
ULONG CodesetsUTF8Len(UTF8 *);
FUNCTION
Returns the amount of real characters stored in a supplied UTF8
string. This is _NOT_ the space required to store the UTF8 string,
it is the actual number of _real_ character the UTF8 represents.
INPUTS
utf8 - pointer to the UTF8 string generated by the internal
functions of codesets.library
RESULT
len - length of utf8
SEE ALSO
codesets.library/CodesetsUTF8CreateA
codesets.library/CodesetsUTF8ToStrA
codesets.library/CodesetsIsValidUTF8
NAME
CodesetsIsValidUTF8 - tells if a supplied standard string is meant to
carry a perfectly valid UTF8 sequence
SYNOPSIS
result = CodesetsIsValidUTF8(str);
D0 A0
BOOL CodesetsIsValidUTF8(STRPTR);
FUNCTION
Returns TRUE in case the supplied string only contains char sequences
which are compatible to the UTF8 standard.
INPUTS
str - a standard STRPTR string.
RESULT
result - TRUE in case the string conatins valid UTF8 data.
NOTE
This function uses the common 'GOOD_UCS' macro together with parsing
the whole string. This means that it will only return TRUE in case
the supplied string only contains UTF8 sequences. A mixture of UTF8
and non-UTF8 sequences will result in the function returning FALSE.
SEE ALSO
codesets.library/CodesetsUTF8CreateA
codesets.library/CodesetsUTF8ToStrA
codesets.library/CodesetsIsLegalUTF8
NAME
CodesetsIsLegalUTF8 - check a UTF8 sequence
SYNOPSIS
res = CodesetsIsLegalUTF8(source, length);
A0 D0
ULONG CodesetsIsLegalUTF8(UTF8 *, ULONG);
FUNCTION
Checks if source is a valid UTF8 sequence generated
by the internal functions of codesets.library
INPUTS
source - the char sequence to check
length - size of source
RESULT
res - TRUE or FALSE
SEE ALSO
codesets.library/CodesetsUTF8CreateA
codesets.library/CodesetsUTF8ToStrA
codesets.library/CodesetsIsLegalUTF8Sequence
NAME
CodesetsIsLegalUTF8Sequence - check a char sequence
SYNOPSIS
res = CodesetsIsLegalUTF8Sequence(source, end);
A0 A1
ULONG CodesetsIsLegalUTF8(UTF8 *, UTF8 *);
FUNCTION
Check if source is a valid UTF8 sequence within the
source and end boundaries.
INPUTS
source - the char sequence to check
end - pointer to the end of the sequence to check
RESULT
res - TRUE or FALSE
SEE ALSO
codesets.library/CodesetsUTF8CreateA
codesets.library/CodesetsUTF8ToStrA
codesets.library/CodesetsStrLenA
NAME
CodesetsStrLenA - returns the length of the source string
in case it will be converted to an UTF8
string.
SYNOPSIS
len = CodesetsStrLenA(str, attrs)
A0 A1
ULONG CodesetsStrLenA(STRPTR, struct TagItem *);
len = CodesetsStrLen(str, tag1, ...);
A0 A1
ULONG CodesetsStrLen(STRPTR, Tag, ...);
FUNCTION
Return the length (size) of str in case it will be converted to
an UTF8 compliant string.
INPUTS
str - the string to obtain length of
attrs - a list of additional tag items. Valid items are:
CSA_SourceCodeset (struct codeset *)
The codeset the source string is encoded in.
Default: the system's default codeset
CSA_SourceLen (ULONG)
The length of str
Default: string length of CSA_Source
RESULT
len - the length of the string if it will be converted to
an UTF8 string.
SEE ALSO
codesets.library/CodesetsUTF8CreateA
codesets.library/CodesetsConvertUTF16toUTF32
NAME
CodesetsConvertUTF16toUTF32 - converts from UTF16 to UTF32
SYNOPSIS
res = CodesetsConvertUTF16toUTF32(sourceStart,sourceEnd,targetStart,targetEnd,flags );
D0 A0 A1 A2 A3 D0
ULONG CodesetsConvertUTF16toUTF32(const UTF16 **,const UTF16 *,UTF32 **,UTF32 *,ULONG);
FUNCTION
Converts UTF16 to UTF32.
INPUTS
RESULT
SEE ALSO
codesets.library/CodesetsConvertUTF16toUTF8
NAME
CodesetsConvertUTF16toUTF8 - converts from UTF16 to UTF8
SYNOPSIS
res = CodesetsConvertUTF16toUTF8(sourceStart,sourceEnd,targetStart,targetEnd,flags );
D0 A0 A1 A2 A3 D0
ULONG CodesetsConvertUTF16toUTF8(const UTF16 **,const UTF16 *,UTF8 **,UTF8 *,ULONG);
FUNCTION
Converts UTF16 to UTF8.
INPUTS
RESULT
SEE ALSO
codesets.library/CodesetsConvertUTF32toUTF16
NAME
CodesetsConvertUTF32toUTF16 - converts from UTF32 to UTF16
SYNOPSIS
res = CodesetsConvertUTF32toUTF16(sourceStart,sourceEnd,targetStart,targetEnd,flags );
D0 A0 A1 A2 A3 D0
ULONG CodesetsConvertUTF32toUTF16(const UTF32 **,const UTF32 *,UTF16 **,UTF16 *,ULONG);
FUNCTION
Converts UTF32 to UTF16.
INPUTS
RESULT
SEE ALSO
codesets.library/CodesetsConvertUTF32toUTF8
NAME
CodesetsConvertUTF32toUTF8 - converts from UTF32 to UTF8
SYNOPSIS
res = CodesetsConvertUTF32toUTF8(sourceStart,sourceEnd,targetStart,targetEnd,flags );
D0 A0 A1 A2 A3 D0
ULONG CodesetsConvertUTF32toUTF8(const UTF32 **,const UTF32 *,UTF8 **,UTF8 *,ULONG);
FUNCTION
Converts UTF32 to UTF16.
INPUTS
RESULT
SEE ALSO
codesets.library/CodesetsConvertUTF8toUTF16
NAME
CodesetsConvertUTF8toUTF16 - converts from UTF8 to UTF16
SYNOPSIS
res = CodesetsConvertUTF8toUTF16(sourceStart,sourceEnd,targetStart,targetEnd,flags );
D0 A0 A1 A2 A3 D0
ULONG CodesetsConvertUTF8toUTF16(const UTF8 **,const UTF8 *,UTF16 **,UTF16 *,ULONG);
FUNCTION
Converts UTF8 to UTF16.
INPUTS
RESULT
SEE ALSO
codesets.library/CodesetsConvertUTF8toUTF32
NAME
CodesetsConvertUTF8toUTF32 - converts from UTF8 to UTF32
SYNOPSIS
res = CodesetsConvertUTF8toUTF32(sourceStart,sourceEnd,targetStart,targetEnd,flags );
D0 A0 A1 A2 A3 D0
ULONG CodesetsConvertUTF8toUTF32(const UTF8 **,const UTF8 *,UTF32 **,UTF32 *,ULONG);
FUNCTION
Converts UTF8 to UTF32.
INPUTS
RESULT
SEE ALSO
codesets.library/CodesetsDecodeB64A
NAME
CodesetsDecodeB64A - decodes a supplied base64 encoded string
or file into plain text charwise.
SYNOPSIS
res = CodesetsDecodeB64A(attrs);
D0 A0
ULONG CodesetsDecodeB64A(struct TagItem *);
res = CodesetsDecodeB64(tag1, ...);
D0 A0
ULONG CodesetsDecodeB64A(Tag, ....);
FUNCTION
Decodes a string or a complete base64 encoded file to a
plain text buffer or also a destination file
INPUTS
attrs - a list of mandatory tag items. Valid items are:
CSA_B64SourceString (STRPTR)
The source string to decode
CSA_B64SourceLen (ULONG)
The length of CSA_B64SourceString Must be supplied if
CSA_B64SourceString is used.
CSA_B64SourceFile (STRPTR)
Source file name.
CSA_B64DestPtr (STRPTR *)
Destination buffer pointer. Set to the allocated buffer.
Must be supplied if CSA_B64DestFile is not used. To
free the buffer use CodesetsFreeA().
CSA_B64DestFile (STRPTR)
Destination file name. Must be supplied if
CSA_B64DestPtr is used.
CSA_B64FLG_NtCheckErr (BOOL)
Don't stop on error.
RESULT
res - result, one of (if 0 OK, if >0 error)
CSR_B64_ERROR_OK
CSR_B64_ERROR_MEM
CSR_B64_ERROR_DOS
CSR_B64_ERROR_INCOMPLETE
CSR_B64_ERROR_ILLEGAL
NOTE
It fully operates charwise and doesn't take respect of the
individual codeset the decoded data may be still be encoded to.
SEE ALSO
codesets.library/CodesetsEncodeB64A
codesets.library/CodesetsEncodeB64A
NAME
CodesetsEncodeB64A - encodes a string or whole file
to base64
SYNOPSIS
res = CodesetsEncodeB64A(attrs);
D0 A0
ULONG CodesetsEncodeB64A(struct TagItem *);
res = CodesetsEncodeB64(tag1, ...);
D0 A0
ULONG CodesetsEncodeB64(Tag, ....);
FUNCTION
Encodes the supplied string or file to either a whole
buffer or also to a file.
INPUTS
attrs - a list of mandatory tag items. Valid items are:
CSA_B64SourceString (STRPTR)
The source string to encode
CSA_B64SourceLen (ULONG)
The length of CSA_B64SourceString. Must be supplied if
CSA_B64SourceString is used.
CSA_B64SourceFile (STRPTR)
Source file name.
CSA_B64DestPtr (STRPTR *)
Destination buffer pointer. Set to the allocated buffer.
Must be supplied if CSA_B64DestFile is not used. To
free the buffer use CodesetsFreeA().
CSA_B64DestFile (STRPTR)
Destination file name. Must be supplied if
CSA_B64DestPtr is used.
CSA_B64MaxLineLen (ULONG)
Maximum length of encoded lines. 0<v<256
Default: 72
CSA_B64Unix (ULONG)
If TRUE eol is \n (LF), otherwise \r\n (CRLF).
Default: TRUE
RESULT
res - result, one of (if 0 OK, if >0 error)
CSR_B64_ERROR_OK
CSR_B64_ERROR_MEM
CSR_B64_ERROR_DOS
CSR_B64_ERROR_INCOMPLETE
CSR_B64_ERROR_ILLEGAL
NOTE
It fully operates charwise and doesn't take respect of the
individual codeset the decoded data may be encoded to.
SEE ALSO
codesets.library/CodesetsDecodeB64A