Aros/Developer/Docs/Libraries/Codesets

Navbar for the Aros wikibook
Aros User
Aros User Docs
Aros User FAQs
Aros User Applications
Aros User DOS Shell
Aros/User/AmigaLegacy
Aros Dev Docs
Aros Developer Docs
Porting Software from AmigaOS/SDL
For Zune Beginners
Zune .MUI Classes
For SDL Beginners
Aros Developer BuildSystem
Specific platforms
Aros x86 Complete System HCL
Aros x86 Audio/Video Support
Aros x86 Network Support
Aros Intel AMD x86 Installing
Aros Storage Support IDE SATA etc
Aros Poseidon USB Support
x86-64 Support
Motorola 68k Amiga Support
Linux and FreeBSD Support
Windows Mingw and MacOSX Support
Android Support
Arm Raspberry Pi Support
PPC Power Architecture
misc
Aros Public License

Introduction

Character set (charsets) encoding is the process of assigning numbers to graphical characters, especially the written characters of human language

Unicode v16.0 emojis are not supported but codesets.library provides

internally supported (hardcoded) charsets/codesets are: (conversions are possible from and to each codeset):

    AmigaPL – Polish (Amiga)
    Amiga-1251 – Cyrillic (Amiga)
    ISO-8859-1 – Western European aka Latin alphabet no. 1 ASCII based 
    ISO-8859-1+Euro – West European (with EURO)
    ISO-8859-2 – Central/East European
    ISO-8859-3 – South European
    ISO-8859-4 – North European
    ISO-8859-5 – Slavic languages
    ISO-8859-9 – Turkish
    ISO-8859-15 – West European II
    ISO-8859-16 – South-Eastern European
    KOI8-R – Russian
    UTF-8 – Unicode

In addition, external charset table files can be stored in LIBS:Charsets or loaded by an application from PROGDIR:Charsets. The charset files included with this distributions are:

    IBM866 – Cyrillic (cp866)
    ISO-8859-7 – Greek (LatinGreek)
    ISO-8859-10 – Nordic (Latin 6)
    windows-1250 – Central/East Europe (Windows)
    windows-1251 – Cyrillic (Windows)
    windows-1252 – West European (Windows)

Windows-1252 was first character set in Windows. It was a copy of ASCII, but used 8-bits to represent 256 different characters (international letters). Windows-1252 is supported by all browsers.


Source Code

/***************************************************************************

 codesets.library - Amiga shared library for handling different codesets
 Copyright (C) 2001-2005 by Alfonso [alfie] Ranieri <alforan@tin.it>.
 Copyright (C) 2005-2014 codesets.library Open Source Team

 This library is free software; you can redistribute it and/or
 modify it under the terms of the GNU Lesser General Public
 License as published by the Free Software Foundation; either
 version 2.1 of the License, or (at your option) any later version.

 This library is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 Lesser General Public License for more details.

 codesets.library project: http://sourceforge.net/projects/codesetslib/

 Most of the code included in this file was relicensed from GPL to LGPL
 from the source code of SimpleMail (http://www.sf.net/projects/simplemail)
 with full permissions by its authors.

 $Id$

***************************************************************************/

#include <exec/libraries.h>
#include <libraries/codesets.h>
#include <proto/codesets.h>
#include <proto/exec.h>

#include <stdio.h>

/* This is just a very quickly written test, not a full-featured convertor */
#define BUF_SIZE 102400

struct Library *CodesetsBase = NULL;
#if defined(__amigaos4__)
struct CodesetsIFace *ICodesets = NULL;
#endif

#if defined(__amigaos4__)
#define GETINTERFACE(iface, base)	(iface = (APTR)GetInterface((struct Library *)(base), "main", 1L, NULL))
#define DROPINTERFACE(iface)			(DropInterface((struct Interface *)iface), iface = NULL)
#else
#define GETINTERFACE(iface, base)	TRUE
#define DROPINTERFACE(iface)
#endif

struct codeset *srcCodeset;
struct codeset *destCodeset;

int main(int argc, char **argv)
{
  char *buf, *destbuf;
  ULONG destlen;
  FILE *f;

  if (argc < 4)
  {
    fprintf(stderr, "Usage: %s <source codeset> <destination codeset> <source file>\n", argv[0]);
    return 0;
  }
  if((CodesetsBase = OpenLibrary(CODESETSNAME,CODESETSVER)) &&
     GETINTERFACE(ICodesets, CodesetsBase))
  {
    srcCodeset = CodesetsFind(argv[1], CSA_FallbackToDefault, FALSE, TAG_DONE);
    if (srcCodeset)
    {
      destCodeset = CodesetsFind(argv[2], CSA_FallbackToDefault, FALSE, TAG_DONE);
      if (destCodeset)
      {
        buf = AllocMem(BUF_SIZE, MEMF_CLEAR);

        if (buf)
        {
          f = fopen(argv[3], "r");
          if (f)
          {
            fread(buf, BUF_SIZE-1, 1, f);
            fclose(f);
            destbuf = CodesetsConvertStr(CSA_SourceCodeset, (IPTR)srcCodeset,
                                         CSA_DestCodeset, (IPTR)destCodeset,
                                         CSA_Source, (IPTR)buf,
                                         CSA_DestLenPtr, (IPTR)&destlen,
                                         TAG_DONE);
            if (destbuf)
            {
              fprintf(stderr, "Result length: %u\n", (unsigned int)destlen);
              fwrite(destbuf, destlen, 1, stdout);
              fputc('\n', stderr);
              CodesetsFreeA(destbuf, NULL);
            }
          else
            fprintf(stderr, "Failed to convert text!\n");
          }
          FreeMem(buf, BUF_SIZE);
        }
        else
          fprintf(stderr, "Failed to allocate %d bytes for buffer\n", BUF_SIZE);
      }
      else
        fprintf(stderr, "Unknown destination codeset %s\n", argv[2]);
    }
    else
      fprintf(stderr, "Unknown source codeset %s\n", argv[1]);

    DROPINTERFACE(ICodesets);
    CloseLibrary(CodesetsBase);
  }
  else
    fprintf(stderr, "Failed to open codesets.library!\n");

  return 0;
}


/***************************************************************************

 codesets.library - Amiga shared library for handling different codesets
 Copyright (C) 2001-2005 by Alfonso [alfie] Ranieri <alforan@tin.it>.
 Copyright (C) 2005-2014 codesets.library Open Source Team

 This library is free software; you can redistribute it and/or
 modify it under the terms of the GNU Lesser General Public
 License as published by the Free Software Foundation; either
 version 2.1 of the License, or (at your option) any later version.

 This library is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 Lesser General Public License for more details.

 codesets.library project: http://sourceforge.net/projects/codesetslib/

 Most of the code included in this file was relicensed from GPL to LGPL
 from the source code of SimpleMail (http://www.sf.net/projects/simplemail)
 with full permissions by its authors.

 $Id$

***************************************************************************/

#include <proto/exec.h>
#include <proto/codesets.h>
#include <stdio.h>
#include <string.h>

#define ISO8859_1_STR "Schmöre bröd, schmöre bröd, bröd bröd bräd."
#define CP1251_STR    "1251 êîäèðîâêà äëÿ ïðèìåðà."
#define ASCII_STR     "latin 1 bla bla bla."
#define KOI8R_STR     "koi îÅ×ÏÚÍÏÖÎÏ ÐÅÒÅËÏÄÉÒÏ×ÁÔØ ÉÚ ËÏÄÉÒÏ×ËÉ"

struct Library *CodesetsBase = NULL;
#if defined(__amigaos4__)
struct CodesetsIFace* ICodesets = NULL;
#endif

#if defined(__amigaos4__)
#define GETINTERFACE(iface, base)	(iface = (APTR)GetInterface((struct Library *)(base), "main", 1L, NULL))
#define DROPINTERFACE(iface)			(DropInterface((struct Interface *)iface), iface = NULL)
#else
#define GETINTERFACE(iface, base)	TRUE
#define DROPINTERFACE(iface)
#endif

int main(void)
{
  int res;

  if((CodesetsBase = OpenLibrary(CODESETSNAME,CODESETSVER)) &&
      GETINTERFACE(ICodesets, CodesetsBase))
  {
    IPTR errNum = 0;
    struct codeset *cs;

    if((cs = CodesetsFindBest(CSA_Source, (IPTR)ISO8859_1_STR,
                              CSA_ErrPtr, (IPTR)&errNum,
                              TAG_DONE)))
    {
      printf("Identified ISO8859_1_STR as %s with %d of %d errors\n", cs->name, (int)errNum, (int)strlen(ISO8859_1_STR));
    }
    else
      printf("couldn't identify ISO8859_1_STR!\n");

    if((cs = CodesetsFindBest(CSA_Source, (IPTR)CP1251_STR,
                              CSA_ErrPtr, (IPTR)&errNum,
                              CSA_CodesetFamily, CSV_CodesetFamily_Cyrillic,
                              TAG_DONE)))
    {
      printf("Identified CP1251_STR as %s with %d of %d errors\n", cs->name, (int)errNum, (int)strlen(CP1251_STR));
    }
    else
      printf("couldn't identify CP1251_STR!\n");

    if((cs = CodesetsFindBest(CSA_Source, (IPTR)ASCII_STR,
                              CSA_ErrPtr, (IPTR)&errNum,
                              CSA_CodesetFamily, CSV_CodesetFamily_Cyrillic,
                              TAG_DONE)))
    {
      printf("Identified ASCII_STR as %s with %d of %d errors\n", cs->name, (int)errNum, (int)strlen(ASCII_STR));
    }
    else
      printf("couldn't identify ASCII_STR!\n");

    if((cs = CodesetsFindBest(CSA_Source, (IPTR)KOI8R_STR,
                              CSA_ErrPtr, (IPTR)&errNum,
                              CSA_CodesetFamily, CSV_CodesetFamily_Cyrillic,
                              TAG_DONE)))
    {
      printf("Identified KOI8R_STR as %s with %d of %d errors\n", cs->name, (int)errNum, (int)strlen(KOI8R_STR));
    }
    else
      printf("couldn't identify KOI8R_STR!\n");

    res = 0;

    DROPINTERFACE(ICodesets);
    CloseLibrary(CodesetsBase);
    CodesetsBase = NULL;
  }
  else
  {
    printf("can't open %s %d+\n",CODESETSNAME,CODESETSVER);
    res = 20;
  }

  return res;
}


From SimpleMail

/***************************************************************************
 SimpleMail - Copyright (C) 2000 Hynek Schlawack and Sebastian Bauer

 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
 the Free Software Foundation; either version 2 of the License, or
 (at your option) any later version.

 This program is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.

 You should have received a copy of the GNU General Public License
 along with this program; if not, write to the Free Software
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
***************************************************************************/

/**
 * @brief Support of codesets.
 *
 * @file codesets.c
 */

#include "codesets.h"

#include <ctype.h>
#include <dirent.h> /* dir stuff */
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include "codesets_table.h"
#include "debug.h"
#include "punycode.h"
#include "smintl.h"
#include "support_indep.h"

/* from ConvertUTF.h */

/*
 * Copyright 2001 Unicode, Inc.
 *
 * Disclaimer
 *
 * This source code is provided as is by Unicode, Inc. No claims are
 * made as to fitness for any particular purpose. No warranties of any
 * kind are expressed or implied. The recipient agrees to determine
 * applicability of information provided. If this file has been
 * purchased on magnetic or optical media from Unicode, Inc., the
 * sole remedy for any claim will be exchange of defective media
 * within 90 days of receipt.
 *
 * Limitations on Rights to Redistribute This Code
 *
 * Unicode, Inc. hereby grants the right to freely use the information
 * supplied in this file in the creation of products supporting the
 * Unicode Standard, and to make copies of this file in any form
 * for internal or external distribution as long as this notice
 * remains attached.
 */

/* ---------------------------------------------------------------------

    Conversions between UTF32, UTF-16, and UTF-8.  Header file.

    Several funtions are included here, forming a complete set of
    conversions between the three formats.  UTF-7 is not included
    here, but is handled in a separate source file.

    Each of these routines takes pointers to input buffers and output
    buffers.  The input buffers are const.

    Each routine converts the text between *sourceStart and sourceEnd,
    putting the result into the buffer between *targetStart and
    targetEnd. Note: the end pointers are *after* the last item: e.g.
    *(sourceEnd - 1) is the last item.

    The return result indicates whether the conversion was successful,
    and if not, whether the problem was in the source or target buffers.
    (Only the first encountered problem is indicated.)

    After the conversion, *sourceStart and *targetStart are both
    updated to point to the end of last text successfully converted in
    the respective buffers.

    Input parameters:
	sourceStart - pointer to a pointer to the source buffer.
		The contents of this are modified on return so that
		it points at the next thing to be converted.
	targetStart - similarly, pointer to pointer to the target buffer.
	sourceEnd, targetEnd - respectively pointers to the ends of the
		two buffers, for overflow checking only.

    These conversion functions take a ConversionFlags argument. When this
    flag is set to strict, both irregular sequences and isolated surrogates
    will cause an error.  When the flag is set to lenient, both irregular
    sequences and isolated surrogates are converted.

    Whether the flag is strict or lenient, all illegal sequences will cause
    an error return. This includes sequences such as: <F4 90 80 80>, <C0 80>,
    or <A0> in UTF-8, and values above 0x10FFFF in UTF-32. Conformant code
    must check for illegal sequences.

    When the flag is set to lenient, characters over 0x10FFFF are converted
    to the replacement character; otherwise (when the flag is set to strict)
    they constitute an error.

    Output parameters:
	The value "sourceIllegal" is returned from some routines if the input
	sequence is malformed.  When "sourceIllegal" is returned, the source
	value will point to the illegal value that caused the problem. E.g.,
	in UTF-8 when a sequence is malformed, it points to the start of the
	malformed sequence.

    Author: Mark E. Davis, 1994.
    Rev History: Rick McGowan, fixes & updates May 2001.

------------------------------------------------------------------------ */

/* ---------------------------------------------------------------------
    The following 4 definitions are compiler-specific.
    The C standard does not guarantee that wchar_t has at least
    16 bits, so wchar_t is no less portable than unsigned short!
    All should be unsigned values to avoid sign extension during
    bit mask & shift operations.
------------------------------------------------------------------------ */

typedef unsigned long	UTF32;	/* at least 32 bits */
typedef unsigned short	UTF16;	/* at least 16 bits */
typedef unsigned char	UTF8;	/* typically 8 bits */
typedef unsigned char	Boolean; /* 0 or 1 */

/* Some fundamental constants */
#define UNI_REPLACEMENT_CHAR (UTF32)0x0000FFFD
#define UNI_MAX_BMP (UTF32)0x0000FFFF
#define UNI_MAX_UTF16 (UTF32)0x0010FFFF
#define UNI_MAX_UTF32 (UTF32)0x7FFFFFFF

typedef enum {
	conversionOK, 		/* conversion successful */
	sourceExhausted,	/* partial character in source, but hit end */
	targetExhausted,	/* insuff. room in target for conversion */
	sourceIllegal,		/* source sequence is illegal/malformed */
  sourceCorrupt,    /* source contains invalid UTF-7 */ /* addded */
} ConversionResult;

typedef enum {
	strictConversion = 0,
	lenientConversion
} ConversionFlags;

ConversionResult ConvertUTF32toUTF16 (
		UTF32** sourceStart, const UTF32* sourceEnd,
		UTF16** targetStart, const UTF16* targetEnd, const ConversionFlags flags);

ConversionResult ConvertUTF16toUTF32 (
		UTF16** sourceStart, UTF16* sourceEnd,
		UTF32** targetStart, const UTF32* targetEnd, const ConversionFlags flags);

ConversionResult ConvertUTF16toUTF8 (
		UTF16** sourceStart, const UTF16* sourceEnd,
		UTF8** targetStart, const UTF8* targetEnd, ConversionFlags flags);

ConversionResult ConvertUTF8toUTF16 (
		UTF8** sourceStart, UTF8* sourceEnd,
		UTF16** targetStart, const UTF16* targetEnd, const ConversionFlags flags);

ConversionResult ConvertUTF32toUTF8 (
		UTF32** sourceStart, const UTF32* sourceEnd,
		UTF8** targetStart, const UTF8* targetEnd, ConversionFlags flags);

ConversionResult ConvertUTF8toUTF32 (
		UTF8** sourceStart, UTF8* sourceEnd,
		UTF32** targetStart, const UTF32* targetEnd, ConversionFlags flags);

static Boolean isLegalUTF8Sequence(const UTF8 *source, const UTF8 *sourceEnd);

/* --------------------------------------------------------------------- */

int utf8islegal(const char *source, const char *sourceend)
{
	return isLegalUTF8Sequence((const UTF8*)source, (const UTF8*)sourceend);
}

/* --------------------------------------------------------------------- */

/* ConvertUTF.c */

/*
 * Copyright 2001 Unicode, Inc.
 *
 * Disclaimer
 *
 * This source code is provided as is by Unicode, Inc. No claims are
 * made as to fitness for any particular purpose. No warranties of any
 * kind are expressed or implied. The recipient agrees to determine
 * applicability of information provided. If this file has been
 * purchased on magnetic or optical media from Unicode, Inc., the
 * sole remedy for any claim will be exchange of defective media
 * within 90 days of receipt.
 *
 * Limitations on Rights to Redistribute This Code
 *
 * Unicode, Inc. hereby grants the right to freely use the information
 * supplied in this file in the creation of products supporting the
 * Unicode Standard, and to make copies of this file in any form
 * for internal or external distribution as long as this notice
 * remains attached.
 */

/* ---------------------------------------------------------------------

    Conversions between UTF32, UTF-16, and UTF-8. Source code file.
	Author: Mark E. Davis, 1994.
	Rev History: Rick McGowan, fixes & updates May 2001.

    See the header file "ConvertUTF.h" for complete documentation.

------------------------------------------------------------------------ */


/*#include "ConvertUTF.h"*/
/*#ifdef CVTUTF_DEBUG*/
#include <stdio.h>
/*#endif*/

static const int halfShift	= 10; /* used for shifting by 10 bits */

static const UTF32 halfBase	= 0x0010000UL;
static const UTF32 halfMask	= 0x3FFUL;

#define UNI_SUR_HIGH_START	(UTF32)0xD800
#define UNI_SUR_HIGH_END	(UTF32)0xDBFF
#define UNI_SUR_LOW_START	(UTF32)0xDC00
#define UNI_SUR_LOW_END		(UTF32)0xDFFF
#define false			0
#define true			1

/* --------------------------------------------------------------------- */

ConversionResult ConvertUTF32toUTF16 (
		UTF32** sourceStart, const UTF32* sourceEnd,
		UTF16** targetStart, const UTF16* targetEnd, const ConversionFlags flags) {
	ConversionResult result = conversionOK;
	UTF32* source = *sourceStart;
	UTF16* target = *targetStart;
	while (source < sourceEnd) {
		UTF32 ch;
		if (target >= targetEnd) {
			result = targetExhausted; break;
		}
		ch = *source++;
		if (ch <= UNI_MAX_BMP) { /* Target is a character <= 0xFFFF */
			if ((flags == strictConversion) && (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_LOW_END)) {
				--source; /* return to the illegal value itself */
				result = sourceIllegal;
				break;
			} else {
			    *target++ = ch;	/* normal case */
			}
		} else if (ch > UNI_MAX_UTF16) {
			if (flags == strictConversion) {
				result = sourceIllegal;
			} else {
				*target++ = UNI_REPLACEMENT_CHAR;
			}
		} else {
			/* target is a character in range 0xFFFF - 0x10FFFF. */
			if (target + 1 >= targetEnd) {
				result = targetExhausted; break;
			}
			ch -= halfBase;
			*target++ = (ch >> halfShift) + UNI_SUR_HIGH_START;
			*target++ = (ch & halfMask) + UNI_SUR_LOW_START;
		}
	}
	*sourceStart = source;
	*targetStart = target;
	return result;
}

/* --------------------------------------------------------------------- */

ConversionResult ConvertUTF16toUTF32 (
		UTF16** sourceStart, UTF16* sourceEnd,
		UTF32** targetStart, const UTF32* targetEnd, const ConversionFlags flags) {
	ConversionResult result = conversionOK;
	UTF16* source = *sourceStart;
	UTF32* target = *targetStart;
	UTF32 ch, ch2;
	while (source < sourceEnd) {
		ch = *source++;
		if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_HIGH_END && source < sourceEnd) {
			ch2 = *source;
			if (ch2 >= UNI_SUR_LOW_START && ch2 <= UNI_SUR_LOW_END) {
				ch = ((ch - UNI_SUR_HIGH_START) << halfShift)
					+ (ch2 - UNI_SUR_LOW_START) + halfBase;
				++source;
			} else if (flags == strictConversion) { /* it's an unpaired high surrogate */
				--source; /* return to the illegal value itself */
				result = sourceIllegal;
				break;
			}
		} else if ((flags == strictConversion) && (ch >= UNI_SUR_LOW_START && ch <= UNI_SUR_LOW_END)) {
			/* an unpaired low surrogate */
			--source; /* return to the illegal value itself */
			result = sourceIllegal;
			break;
		}
		if (target >= targetEnd) {
			result = targetExhausted; break;
		}
		*target++ = ch;
	}
	*sourceStart = source;
	*targetStart = target;
#ifdef CVTUTF_DEBUG
if (result == sourceIllegal) {
    fprintf(stderr, "ConvertUTF16toUTF32 illegal seq 0x%04x,%04x\n", ch, ch2);
    fflush(stderr);
}
#endif
	return result;
}

/* --------------------------------------------------------------------- */

/*
 * Index into the table below with the first byte of a UTF-8 sequence to
 * get the number of trailing bytes that are supposed to follow it.
 */

static const char trailingBytesForUTF8[256] = {
	0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
	0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
	0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
	0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
	0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
	0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
	1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
	2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5
};

/*
 * Magic values subtracted from a buffer value during UTF8 conversion.
 * This table contains as many values as there might be trailing bytes
 * in a UTF-8 sequence.
 */
static const UTF32 offsetsFromUTF8[6] = { 0x00000000UL, 0x00003080UL, 0x000E2080UL,
					 0x03C82080UL, 0xFA082080UL, 0x82082080UL };

/*
 * Once the bits are split out into bytes of UTF-8, this is a mask OR-ed
 * into the first byte, depending on how many bytes follow.  There are
 * as many entries in this table as there are UTF-8 sequence types.
 * (I.e., one byte sequence, two byte... six byte sequence.)
 */
static const UTF8 firstByteMark[7] = { 0x00, 0x00, 0xC0, 0xE0, 0xF0, 0xF8, 0xFC };

/* --------------------------------------------------------------------- */

/* The interface converts a whole buffer to avoid function-call overhead.
 * Constants have been gathered. Loops & conditionals have been removed as
 * much as possible for efficiency, in favor of drop-through switches.
 * (See "Note A" at the bottom of the file for equivalent code.)
 * If your compiler supports it, the "isLegalUTF8" call can be turned
 * into an inline function.
 */

/* --------------------------------------------------------------------- */

ConversionResult ConvertUTF16toUTF8 (
		UTF16** sourceStart, const UTF16* sourceEnd,
		UTF8** targetStart, const UTF8* targetEnd, ConversionFlags flags) {
	ConversionResult result = conversionOK;
	UTF16* source = *sourceStart;
	UTF8* target = *targetStart;
	while (source < sourceEnd) {
		UTF32 ch;
		unsigned short bytesToWrite = 0;
		const UTF32 byteMask = 0xBF;
		const UTF32 byteMark = 0x80;
		ch = *source++;
		/* If we have a surrogate pair, convert to UTF32 first. */
		if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_HIGH_END && source < sourceEnd) {
			UTF32 ch2 = *source;
			if (ch2 >= UNI_SUR_LOW_START && ch2 <= UNI_SUR_LOW_END) {
				ch = ((ch - UNI_SUR_HIGH_START) << halfShift)
					+ (ch2 - UNI_SUR_LOW_START) + halfBase;
				++source;
			} else if (flags == strictConversion) { /* it's an unpaired high surrogate */
				--source; /* return to the illegal value itself */
				result = sourceIllegal;
				break;
			}
		} else if ((flags == strictConversion) && (ch >= UNI_SUR_LOW_START && ch <= UNI_SUR_LOW_END)) {
			--source; /* return to the illegal value itself */
			result = sourceIllegal;
			break;
		}
		/* Figure out how many bytes the result will require */
		if (ch < (UTF32)0x80) {			bytesToWrite = 1;
		} else if (ch < (UTF32)0x800) {		bytesToWrite = 2;
		} else if (ch < (UTF32)0x10000) {	bytesToWrite = 3;
		} else if (ch < (UTF32)0x200000) {	bytesToWrite = 4;
		} else {				bytesToWrite = 2;
							ch = UNI_REPLACEMENT_CHAR;
		}

		target += bytesToWrite;
		if (target > targetEnd) {
			target -= bytesToWrite; result = targetExhausted; break;
		}
		switch (bytesToWrite) {	/* note: everything falls through. */
			case 4:	*--target = (ch | byteMark) & byteMask; ch >>= 6;
			case 3:	*--target = (ch | byteMark) & byteMask; ch >>= 6;
			case 2:	*--target = (ch | byteMark) & byteMask; ch >>= 6;
			case 1:	*--target =  ch | firstByteMark[bytesToWrite];
		}
		target += bytesToWrite;
	}
	*sourceStart = source;
	*targetStart = target;
	return result;
}

/* --------------------------------------------------------------------- */

/*
 * Utility routine to tell whether a sequence of bytes is legal UTF-8.
 * This must be called with the length pre-determined by the first byte.
 * If not calling this from ConvertUTF8to*, then the length can be set by:
 *	length = trailingBytesForUTF8[*source]+1;
 * and the sequence is illegal right away if there aren't that many bytes
 * available.
 * If presented with a length > 4, this returns false.  The Unicode
 * definition of UTF-8 goes up to 4-byte sequences.
 */

static Boolean isLegalUTF8(const UTF8 *source, int length) {
	UTF8 a;
	const UTF8 *srcptr = source+length;
	switch (length) {
	default: return false;
		/* Everything else falls through when "true"... */
	case 4: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false;
	case 3: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false;
	case 2: if ((a = (*--srcptr)) > 0xBF) return false;
		switch (*source) {
		    /* no fall-through in this inner switch */
		    case 0xE0: if (a < 0xA0) return false; break;
		    case 0xF0: if (a < 0x90) return false; break;
		    case 0xF4: if (a > 0x8F) return false; break;
		    default:  if (a < 0x80) return false;
		}
    	case 1: if (*source >= 0x80 && *source < 0xC2) return false;
		if (*source > 0xF4) return false;
	}
	return true;
}

/* --------------------------------------------------------------------- */

/*
 * Exported function to return whether a UTF-8 sequence is legal or not.
 * This is not used here; it's just exported.
 */
Boolean isLegalUTF8Sequence(const UTF8 *source, const UTF8 *sourceEnd) {
	int length = trailingBytesForUTF8[*source]+1;
	if (source+length > sourceEnd) {
	    return false;
	}
	return isLegalUTF8(source, length);
}

/* --------------------------------------------------------------------- */

ConversionResult ConvertUTF8toUTF16 (
		UTF8** sourceStart, UTF8* sourceEnd,
		UTF16** targetStart, const UTF16* targetEnd, const ConversionFlags flags) {
	ConversionResult result = conversionOK;
	UTF8* source = *sourceStart;
	UTF16* target = *targetStart;
	while (source < sourceEnd) {
		UTF32 ch = 0;
		unsigned short extraBytesToRead = trailingBytesForUTF8[*source];
		if (source + extraBytesToRead >= sourceEnd) {
			result = sourceExhausted; break;
		}
		/* Do this check whether lenient or strict */
		if (! isLegalUTF8(source, extraBytesToRead+1)) {
			result = sourceIllegal;
			break;
		}
		/*
		 * The cases all fall through. See "Note A" below.
		 */
		switch (extraBytesToRead) {
			case 3:	ch += *source++; ch <<= 6;
			case 2:	ch += *source++; ch <<= 6;
			case 1:	ch += *source++; ch <<= 6;
			case 0:	ch += *source++;
		}
		ch -= offsetsFromUTF8[extraBytesToRead];

		if (target >= targetEnd) {
			result = targetExhausted; break;
		}
		if (ch <= UNI_MAX_BMP) { /* Target is a character <= 0xFFFF */
			if ((flags == strictConversion) && (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_LOW_END)) {
				--source; /* return to the illegal value itself */
				result = sourceIllegal;
				break;
			} else {
			    *target++ = ch;	/* normal case */
			}
		} else if (ch > UNI_MAX_UTF16) {
			if (flags == strictConversion) {
				result = sourceIllegal;
				source -= extraBytesToRead; /* return to the start */
			} else {
				*target++ = UNI_REPLACEMENT_CHAR;
			}
		} else {
			/* target is a character in range 0xFFFF - 0x10FFFF. */
			if (target + 1 >= targetEnd) {
				result = targetExhausted; break;
			}
			ch -= halfBase;
			*target++ = (ch >> halfShift) + UNI_SUR_HIGH_START;
			*target++ = (ch & halfMask) + UNI_SUR_LOW_START;
		}
	}
	*sourceStart = source;
	*targetStart = target;
	return result;
}

/* --------------------------------------------------------------------- */

ConversionResult ConvertUTF32toUTF8 (
		UTF32** sourceStart, const UTF32* sourceEnd,
		UTF8** targetStart, const UTF8* targetEnd, ConversionFlags flags) {
	ConversionResult result = conversionOK;
	UTF32* source = *sourceStart;
	UTF8* target = *targetStart;
	while (source < sourceEnd) {
		UTF32 ch;
		unsigned short bytesToWrite = 0;
		const UTF32 byteMask = 0xBF;
		const UTF32 byteMark = 0x80;
		ch = *source++;
		/* surrogates of any stripe are not legal UTF32 characters */
		if (flags == strictConversion ) {
			if ((ch >= UNI_SUR_HIGH_START) && (ch <= UNI_SUR_LOW_END)) {
				--source; /* return to the illegal value itself */
				result = sourceIllegal;
				break;
			}
		}
		/* Figure out how many bytes the result will require */
		if (ch < (UTF32)0x80) {			bytesToWrite = 1;
		} else if (ch < (UTF32)0x800) {		bytesToWrite = 2;
		} else if (ch < (UTF32)0x10000) {	bytesToWrite = 3;
		} else if (ch < (UTF32)0x200000) {	bytesToWrite = 4;
		} else {				bytesToWrite = 2;
							ch = UNI_REPLACEMENT_CHAR;
		}

		target += bytesToWrite;
		if (target > targetEnd) {
			target -= bytesToWrite; result = targetExhausted; break;
		}
		switch (bytesToWrite) {	/* note: everything falls through. */
			case 4:	*--target = (ch | byteMark) & byteMask; ch >>= 6;
			case 3:	*--target = (ch | byteMark) & byteMask; ch >>= 6;
			case 2:	*--target = (ch | byteMark) & byteMask; ch >>= 6;
			case 1:	*--target =  ch | firstByteMark[bytesToWrite];
		}
		target += bytesToWrite;
	}
	*sourceStart = source;
	*targetStart = target;
	return result;
}

/* --------------------------------------------------------------------- */

ConversionResult ConvertUTF8toUTF32 (
		UTF8** sourceStart, UTF8* sourceEnd,
		UTF32** targetStart, const UTF32* targetEnd, ConversionFlags flags) {
	ConversionResult result = conversionOK;
	UTF8* source = *sourceStart;
	UTF32* target = *targetStart;
	while (source < sourceEnd) {
		UTF32 ch = 0;
		unsigned short extraBytesToRead = trailingBytesForUTF8[*source];
		if (source + extraBytesToRead >= sourceEnd) {
			result = sourceExhausted; break;
		}
		/* Do this check whether lenient or strict */
		if (! isLegalUTF8(source, extraBytesToRead+1)) {
			result = sourceIllegal;
			break;
		}
		/*
		 * The cases all fall through. See "Note A" below.
		 */
		switch (extraBytesToRead) {
			case 3:	ch += *source++; ch <<= 6;
			case 2:	ch += *source++; ch <<= 6;
			case 1:	ch += *source++; ch <<= 6;
			case 0:	ch += *source++;
		}
		ch -= offsetsFromUTF8[extraBytesToRead];

		if (target >= targetEnd) {
			result = targetExhausted; break;
		}
		if (ch <= UNI_MAX_UTF32) {
			*target++ = ch;
		} else if (ch > UNI_MAX_UTF32) {
			*target++ = UNI_REPLACEMENT_CHAR;
		} else {
			if (target + 1 >= targetEnd) {
				result = targetExhausted; break;
			}
			ch -= halfBase;
			*target++ = (ch >> halfShift) + UNI_SUR_HIGH_START;
			*target++ = (ch & halfMask) + UNI_SUR_LOW_START;
		}
	}
	*sourceStart = source;
	*targetStart = target;
	return result;
}

/* ---------------------------------------------------------------------

	Note A.
	The fall-through switches in UTF-8 reading code save a
	temp variable, some decrements & conditionals.  The switches
	are equivalent to the following loop:
		{
			int tmpBytesToRead = extraBytesToRead+1;
			do {
				ch += *source++;
				--tmpBytesToRead;
				if (tmpBytesToRead) ch <<= 6;
			} while (tmpBytesToRead > 0);
		}
	In UTF-8 writing code, the switches on "bytesToWrite" are
	similarly unrolled loops.

   --------------------------------------------------------------------- */

/* Some code has been taken from the ConvertUTF7.c file (the utf7 stuff below),
   this is the copyright notice */

/* ================================================================ */
/*
File:   ConvertUTF7.c
Author: David B. Goldsmith
Copyright (C) 1994, 1996 IBM Corporation All rights reserved.
Revisions: Header update only July, 2001.

This code is copyrighted. Under the copyright laws, this code may not
be copied, in whole or part, without prior written consent of IBM Corporation.

IBM Corporation grants the right to use this code as long as this ENTIRE
copyright notice is reproduced in the code.  The code is provided
AS-IS, AND IBM CORPORATION DISCLAIMS ALL WARRANTIES, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  IN NO EVENT
WILL IBM CORPORATION BE LIABLE FOR ANY DAMAGES WHATSOEVER (INCLUDING,
WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS
INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR OTHER PECUNIARY
LOSS) ARISING OUT OF THE USE OR INABILITY TO USE THIS CODE, EVEN
IF IBM CORPORATION HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
BECAUSE SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF
LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES, THE ABOVE
LIMITATION MAY NOT APPLY TO YOU.

RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the
government is subject to restrictions as set forth in subparagraph
(c)(l)(ii) of the Rights in Technical Data and Computer Software
clause at DFARS 252.227-7013 and FAR 52.227-19.

This code may be protected by one or more U.S. and International
Patents.

*/


/* ------------------------------------- */

struct list codesets_list;

/**************************************************************************
 Returns the supported codesets as an null terminated string array
**************************************************************************/
char **codesets_supported(void)
{
	static char **array;

	if (array) return array;

	if ((array = (char**)malloc(sizeof(char*)*(list_length(&codesets_list)+1))))
	{
		struct codeset *code;
		int i;

		SM_DEBUGF(15,("%ld supported Codesets:\n",list_length(&codesets_list)));

		code = (struct codeset*)list_first(&codesets_list);
		i = 0;

		while (code)
		{
			SM_DEBUGF(15,("  %p next=%p prev=%p list=%p name=%p %s alt=%p char=%p\n",code,code->node.next,code->node.prev,code->node.list,code->name,code->name,code->alt_name,code->characterization));
			array[i++] = code->name;
			code = (struct codeset*)node_next(&code->node);
		}
		array[i] = NULL;
	}
	return array;
}

/**************************************************************************
 The compare function
**************************************************************************/
static int codesets_cmp_unicode(const void *arg1, const void *arg2)
{
	char *a1 = (char*)((struct single_convert*)arg1)->utf8 + 1;
	char *a2 = (char*)((struct single_convert*)arg2)->utf8 + 1;
	return (int)strcmp(a1,a2);
}

/**
 * Reads the codeset table from the given filename and adds it.
 *
 * @param name
 * @return
 */
static int codesets_read_table(char *name)
{
	char buf[512];

	FILE *fh = fopen(name,"r");

	if (fh)
	{
		struct codeset *codeset;
		if ((codeset = (struct codeset*)malloc(sizeof(struct codeset))))
		{
			int i;
			memset(codeset,0,sizeof(struct codeset));

			for (i=0;i<256;i++)
				codeset->table[i].code = codeset->table[i].ucs4 = i;

			while (myreadline(fh,buf))
			{
				char *result;
				if ((result = get_key_value(buf,"Standard"))) codeset->name = mystrdup(result);
				else if ((result = get_key_value(buf,"AltStandard"))) codeset->alt_name = mystrdup(result);
				else if ((result = get_key_value(buf,"ReadOnly"))) codeset->read_only = !!atoi(result);
				else if ((result = get_key_value(buf,"Characterization")))
				{
					if ((result[0] == '_') && (result[1] == '(') && (result[2] == '"'))
					{
						char *end = strchr(result+3,'"');
						if (end)
						{
							char *txt = mystrndup(result+3,end-(result+3));
							if (txt) codeset->characterization = mystrdup(_(txt));
							free(txt);
						}
					}
				} else
				{
					char *p = buf;
					int fmt2 = 0;

					if ((*p == '=') || (fmt2 = ((*p == '0') || (*(p+1)=='x'))))
					{
						p++;
						p += fmt2;

						i = strtol(p,&p,16);
						if (i > 0 && i < 256)
						{
							while (isspace((unsigned char)*p)) p++;

							if (!mystrnicmp(p,"U+",2))
							{
								p += 2;
								codeset->table[i].ucs4 = strtol(p,&p,16);
							} else
							{
								if (*p!='#') codeset->table[i].ucs4 = strtol(p,&p,0);
							}
						}
					}
				}
			}

			for (i=0;i<256;i++)
			{
				UTF32 src = codeset->table[i].ucs4;
				UTF32 *src_ptr = &src;
				UTF8 *dest_ptr = &codeset->table[i].utf8[1];
				ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
				*dest_ptr = 0;
				codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
			}

			memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
			qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
			list_insert_tail(&codesets_list,&codeset->node);
		}
		fclose(fh);
	}
	return 1;
}

/*****************************************************************************/

int codesets_init(void)
{
	int i;
	struct codeset *codeset;
	UTF32 src;

	SM_ENTER;

	list_init(&codesets_list);

	if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 0;
	memset(codeset,0,sizeof(*codeset));
	codeset->name = mystrdup("ISO-8859-1 + Euro");
	codeset->alt_name = NULL;
	codeset->characterization = mystrdup(_("West European (with EURO)"));
	codeset->read_only = 1;

	for (i=0;i<256;i++)
	{
		UTF32 *src_ptr = &src;
		UTF8 *dest_ptr = &codeset->table[i].utf8[1];

		if (i==164) src = 0x20AC; /* the EURO sign */
		else src = i;
		codeset->table[i].code = i;
		codeset->table[i].ucs4 = src;
		ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
		*dest_ptr = 0;
		codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
	}
	memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
	qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
	list_insert_tail(&codesets_list,&codeset->node);

	if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1;
	memset(codeset,0,sizeof(*codeset));
	codeset->name = mystrdup("ISO-8859-1");
	codeset->alt_name = NULL;
	codeset->characterization = mystrdup(_("West European"));
	codeset->read_only = 0;
	for (i=0;i<256;i++)
	{
		UTF32 *src_ptr = &src;
		UTF8 *dest_ptr = &codeset->table[i].utf8[1];

		src = i;
		codeset->table[i].code = i;
		codeset->table[i].ucs4 = src;
		ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
		*dest_ptr = 0;
		codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
	}
	memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
	qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
	list_insert_tail(&codesets_list,&codeset->node);

	if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
	memset(codeset,0,sizeof(*codeset));
	codeset->name = mystrdup("ISO-8859-2");
	codeset->alt_name = NULL;
	codeset->characterization = mystrdup(_("Central/East European"));
	codeset->read_only = 0;
	for (i=0;i<256;i++)
	{
		UTF32 *src_ptr = &src;
		UTF8 *dest_ptr = &codeset->table[i].utf8[1];

		if (i < 0xa0) src = i;
		else src = iso_8859_2_to_ucs4[i-0xa0];
		codeset->table[i].code = i;
		codeset->table[i].ucs4 = src;
		ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
		*dest_ptr = 0;
		codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
	}
	memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
	qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
	list_insert_tail(&codesets_list,&codeset->node);

	if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
	memset(codeset,0,sizeof(*codeset));
	codeset->name = mystrdup("ISO-8859-3");
	codeset->alt_name = NULL;
	codeset->characterization = mystrdup(_("South European"));
	codeset->read_only = 0;
	for (i=0;i<256;i++)
	{
		UTF32 *src_ptr = &src;
		UTF8 *dest_ptr = &codeset->table[i].utf8[1];

		if (i < 0xa0) src = i;
		else src = iso_8859_3_to_ucs4[i-0xa0];
		codeset->table[i].code = i;
		codeset->table[i].ucs4 = src;
		ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
		*dest_ptr = 0;
		codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
	}
	memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
	qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
	list_insert_tail(&codesets_list,&codeset->node);

	if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
	memset(codeset,0,sizeof(*codeset));
	codeset->name = mystrdup("ISO-8859-4");
	codeset->alt_name = NULL;
	codeset->characterization = mystrdup(_("North European"));
	codeset->read_only = 0;
	for (i=0;i<256;i++)
	{
		UTF32 *src_ptr = &src;
		UTF8 *dest_ptr = &codeset->table[i].utf8[1];

		if (i < 0xa0) src = i;
		else src = iso_8859_4_to_ucs4[i-0xa0];
		codeset->table[i].code = i;
		codeset->table[i].ucs4 = src;
		ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
		*dest_ptr = 0;
		codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
	}
	memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
	qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
	list_insert_tail(&codesets_list,&codeset->node);

	if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
	memset(codeset,0,sizeof(*codeset));
	codeset->name = mystrdup("KOI8-R");
	codeset->alt_name = NULL;
	codeset->characterization = mystrdup(_("Russian"));
	codeset->read_only = 0;
	for (i=0;i<256;i++)
	{
		UTF32 *src_ptr = &src;
		UTF8 *dest_ptr = &codeset->table[i].utf8[1];

		if (i < 0x80) src = i;
		else src = koi8r_to_ucs4[i-0x80];
		codeset->table[i].code = i;
		codeset->table[i].ucs4 = src;
		ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
		*dest_ptr = 0;
		codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
	}
	memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
	qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
	list_insert_tail(&codesets_list,&codeset->node);

	if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
	memset(codeset,0,sizeof(*codeset));
	codeset->name = mystrdup("ISO-8859-5");
	codeset->alt_name = NULL;
	codeset->characterization = mystrdup(_("Slavic languages"));
	codeset->read_only = 0;
	for (i=0;i<256;i++)
	{
		UTF32 *src_ptr = &src;
		UTF8 *dest_ptr = &codeset->table[i].utf8[1];

		if (i < 0xa0) src = i;
		else src = iso_8859_5_to_ucs4[i-0xa0];
		codeset->table[i].code = i;
		codeset->table[i].ucs4 = src;
		ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
		*dest_ptr = 0;
		codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
	}
	memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
	qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
	list_insert_tail(&codesets_list,&codeset->node);

	if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
	memset(codeset,0,sizeof(*codeset));
	codeset->name = mystrdup("ISO-8859-9");
	codeset->alt_name = NULL;
	codeset->characterization = mystrdup(_("Turkish"));
	codeset->read_only = 0;
	for (i=0;i<256;i++)
	{
		UTF32 *src_ptr = &src;
		UTF8 *dest_ptr = &codeset->table[i].utf8[1];

		if (i < 0xa0) src = i;
		else src = iso_8859_9_to_ucs4[i-0xa0];
		codeset->table[i].code = i;
		codeset->table[i].ucs4 = src;
		ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
		*dest_ptr = 0;
		codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
	}
	memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
	qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
	list_insert_tail(&codesets_list,&codeset->node);

	if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
	memset(codeset,0,sizeof(*codeset));
	codeset->name = mystrdup("ISO-8859-15");
	codeset->alt_name = NULL;
	codeset->characterization = mystrdup(_("West European II"));

	codeset->read_only = 0;
	for (i=0;i<256;i++)
	{
		UTF32 *src_ptr = &src;
		UTF8 *dest_ptr = &codeset->table[i].utf8[1];

		if (i < 0xa0) src = i;
		else src = iso_8859_15_to_ucs4[i-0xa0];
		codeset->table[i].code = i;
		codeset->table[i].ucs4 = src;
		ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
		*dest_ptr = 0;
		codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
	}
	memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
	qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
	list_insert_tail(&codesets_list,&codeset->node);

	if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
	memset(codeset,0,sizeof(*codeset));
	codeset->name = mystrdup("ISO-8859-16");
	codeset->alt_name = NULL;
	codeset->characterization = mystrdup(_("South-Eastern European"));

	codeset->read_only = 0;
	for (i=0;i<256;i++)
	{
		UTF32 *src_ptr = &src;
		UTF8 *dest_ptr = &codeset->table[i].utf8[1];

		if (i < 0xa0) src = i;
		else src = iso_8859_16_to_ucs4[i-0xa0];
		codeset->table[i].code = i;
		codeset->table[i].ucs4 = src;
		ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
		*dest_ptr = 0;
		codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
	}
	memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
	qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
	list_insert_tail(&codesets_list,&codeset->node);

	if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
	memset(codeset,0,sizeof(*codeset));
	codeset->name = mystrdup("AmigaPL");
	codeset->alt_name = NULL;
	codeset->characterization = mystrdup("AmigaPL");
	codeset->read_only = 1;
	for (i=0;i<256;i++)
	{
		UTF32 *src_ptr = &src;
		UTF8 *dest_ptr = &codeset->table[i].utf8[1];

		if (i < 0xa0) src = i;
		else src = amigapl_to_ucs4[i-0xa0];
		codeset->table[i].code = i;
		codeset->table[i].ucs4 = src;
		ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
		*dest_ptr = 0;
		codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
	}
	memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
	qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
	list_insert_tail(&codesets_list,&codeset->node);

	if (!(codeset = (struct codeset*)malloc(sizeof(struct codeset)))) return 1; /* One entry is enough */
	memset(codeset,0,sizeof(*codeset));
	codeset->name = mystrdup("Amiga-1251");
	codeset->alt_name = NULL;
	codeset->characterization = mystrdup("Amiga-1251");
	codeset->read_only = 1;
	for (i=0;i<256;i++)
	{
		UTF32 *src_ptr = &src;
		UTF8 *dest_ptr = &codeset->table[i].utf8[1];

		if (i < 0xa0) src = i;
		else src = amiga1251_to_ucs4[i-0xa0];
		codeset->table[i].code = i;
		codeset->table[i].ucs4 = src;
		ConvertUTF32toUTF8(&src_ptr, src_ptr + 1, &dest_ptr, dest_ptr + 6, strictConversion);
		*dest_ptr = 0;
		codeset->table[i].utf8[0] = (char*)dest_ptr - (char*)&codeset->table[i].utf8[1];
	}
	memcpy(codeset->table_sorted,codeset->table,sizeof(codeset->table));
	qsort(codeset->table_sorted,256,sizeof(codeset->table[0]),(int (*)(const void *arg1, const void *arg2))codesets_cmp_unicode);
	list_insert_tail(&codesets_list,&codeset->node);

	SM_DEBUGF(15,("%ld internal charsets\n",list_length(&codesets_list)));

	{
		/* dynamicaly loaded */

		DIR *dfd; /* directory descriptor */
		struct dirent *dptr; /* dir entry */
		char path[380];

		getcwd(path, sizeof(path));
		if (chdir(SM_CHARSET_DIR) != -1)
		{
			if ((dfd = opendir(SM_CURRENT_DIR)))
			{
				while ((dptr = readdir(dfd)) != NULL)
				{
					if (!strcmp(".",dptr->d_name) || !strcmp("..",dptr->d_name)) continue;
					SM_DEBUGF(15,("Loading \"%s\" charset\n",dptr->d_name,list_length(&codesets_list)));
					codesets_read_table(dptr->d_name);
				}
				closedir(dfd);
			}
			chdir(path);
		}
	}

	SM_RETURN(1,"%ld");
}

/*****************************************************************************/

void codesets_cleanup(void)
{
	struct codeset *codeset;

	while ((codeset = (struct codeset*)list_remove_tail(&codesets_list)))
	{
		free(codeset->name);
		free(codeset->alt_name);
		free(codeset->characterization);
		free(codeset);
	}
}

/*****************************************************************************/

struct codeset *codesets_find(const char *name)
{
	struct codeset *codeset = (struct codeset*)list_first(&codesets_list);

	/* Return ISO-8859-1 as default codeset */
	if (!name) return codeset;

	while (codeset)
	{
		if (!mystricmp(name,codeset->name) || !mystricmp(name,codeset->alt_name)) return codeset;
		codeset = (struct codeset*)node_next(&codeset->node);
	}
	return NULL;
}

/*****************************************************************************/

int codesets_unconvertable_chars(struct codeset *codeset, const char *text, int text_len)
{
	struct single_convert conv;
	const char *text_ptr = text;
	int i;
	int errors = 0;

	for (i=0;i < text_len;i++)
	{
		unsigned char c = *text_ptr++;
		if (c)
		{
			int len = trailingBytesForUTF8[c];
			conv.utf8[1] = c;
			strncpy((char*)&conv.utf8[2],text_ptr,len);
			conv.utf8[2+len] = 0;
			text_ptr += len;

			if (!bsearch(&conv,codeset->table_sorted,256,sizeof(codeset->table_sorted[0]),codesets_cmp_unicode))
				errors++;
		} else break;
	}

	return errors;
}

/*****************************************************************************/

struct codeset *codesets_find_best(const char *text, int text_len, int *error_ptr)
{
	struct codeset *codeset = (struct codeset*)list_first(&codesets_list);
	struct codeset *best_codeset = NULL;
	int best_errors = text_len;

	while (codeset)
	{
		if (!codeset->read_only)
		{
			int errors = codesets_unconvertable_chars(codeset, text, text_len);

			if (errors < best_errors)
			{
				best_codeset = codeset;
				best_errors = errors;
			}
			if (!best_errors) break;
		}
		codeset = (struct codeset*)node_next(&codeset->node);
	}

	if (!best_codeset) best_codeset = (struct codeset*)list_first(&codesets_list);
	if (error_ptr) *error_ptr = best_errors;

	return best_codeset;
}

/*****************************************************************************/

int utf8len(const utf8 *str)
{
	int len ;
	unsigned char c;

	if (!str) return 0;
	len = 0;

	while ((c = *str++))
	{
		len++;
		str += trailingBytesForUTF8[c];
	}

	return len;
}

/*****************************************************************************/

utf8 *utf8dup(const utf8 *str)
{
	return (utf8*)mystrdup((char*)str);
}

/*****************************************************************************/

int utf8realpos(const utf8 *str, int pos)
{
	const utf8 *str_save = str;
	unsigned char c;

	if (!str) return 0;

	while (pos && (c = *str))
	{
		pos--;
		str += trailingBytesForUTF8[c] + 1;
	}
	return str - str_save;
}

/*****************************************************************************/

int utf8charpos(const utf8 *str, int pos)
{
	int cp = 0;
	unsigned char c;

	while (pos > 0 && (c = *str))
	{
		str += trailingBytesForUTF8[c] + 1;
		pos -= trailingBytesForUTF8[c] + 1;
		cp++;
	}
	return cp;
}

/*****************************************************************************/

int utf8bytes(const utf8 *str)
{
	unsigned char c = *str;
	return trailingBytesForUTF8[c] + 1;
}

/*****************************************************************************/

utf8 *utf8ncpy(utf8 *to, const utf8 *from, int n)
{
	utf8 *saved_to = to;
	for (;n;n--)
	{
		unsigned char c = *from++;
		int len = trailingBytesForUTF8[c];

		*to++ = c;
		for (;len;len--)
		{
			*to++ = *from++;
		}
	}
	return saved_to;
}

/*****************************************************************************/

utf8 *utf8create(const void *from, const char *charset)
{
  /* utf8create_len() will stop on a null byte */
	return utf8create_len(from,charset,0x7fffffff);
}

/*****************************************************************************/

int utf8fromstr(const char *from, struct codeset *codeset, utf8 *dest, unsigned int dest_size)
{
	const char *src = from;
	unsigned char c;
	int conv = 0;

	if (dest_size < 1)
		return 0;

	if (!codeset)
		codeset =  (struct codeset*)list_first(&codesets_list);

	for (src = from;(c = (unsigned char)*src);src++)
	{
		unsigned char *utf8_seq;
		unsigned int l;

		utf8_seq = &codeset->table[c].utf8[0];

		/* Recall that the first element represents
		 * the number of characters */
		l = utf8_seq[0];
		if (dest_size <= l)
			break;

		utf8_seq++;
		for(;(c = *utf8_seq);utf8_seq++)
			*dest++ = c;

		dest_size -= l;
		conv++;
	}

	*dest = 0;
	return conv;
}

/*****************************************************************************/

utf8 *utf8create_len(const void *from, const char *charset, int from_len)
{
	int dest_size = 0;
	char *dest;
	char *src = (char*)from;
	unsigned char c;
	int len;
	struct codeset *codeset = codesets_find(charset);

	if (!from) return NULL;

	if (!codeset)
	{
		if (!mystricmp(charset,"utf-7"))
		{
			return (utf8*)utf7ntoutf8((char *)from,from_len);
		}
		if (!mystricmp(charset,"utf-8"))
		{
			return (utf8*)mystrdup((char *)from);
		}
		codeset = (struct codeset*)list_first(&codesets_list);
	}

	len = from_len;

	while (((c = *src++) && (len--)))
		dest_size += codeset->table[c].utf8[0];

	if ((dest = (char*)malloc(dest_size+1)))
	{
		char *dest_ptr = dest;

		for (src = (char*)from;from_len && (c = *src);src++,from_len--)
		{
			unsigned char *utf8_seq;

			for(utf8_seq = &codeset->table[c].utf8[1];(c = *utf8_seq);utf8_seq++)
				*dest_ptr++ = c;
		}

		*dest_ptr = 0;
		return (utf8*)dest;
	}
	return NULL;
}

/*****************************************************************************/

int utf8tostr(const utf8 *str, char *dest, unsigned int dest_size, struct codeset *codeset)
{
	unsigned int i;
	struct single_convert *f;
	char *dest_iter = dest;

	if (!dest_size)
	{
		return 0;
	}

	if (!codeset) codeset = (struct codeset*)list_first(&codesets_list);
	if (!codeset || !str)
	{
		*dest = 0;
		return 0;
	}

	for (i=0;i < dest_size-1;i++)
	{
		unsigned char c = *str;
		if (c)
		{
			if (c > 127)
			{
				unsigned int len_add = trailingBytesForUTF8[c];
				unsigned int len_str = len_add + 1;

				BIN_SEARCH(codeset->table_sorted,0,255,mystrncmp((unsigned char*)str,codeset->table_sorted[m].utf8+1,len_str),f);

				if (f) *dest_iter++ = f->code;
				else *dest_iter++ = '_';

				str += len_add;
			} else *dest_iter++ = c;

			str++;
		} else break;
	}
	*dest_iter = 0;
	return i;
}

/*****************************************************************************/

char *utf8tostrcreate(const utf8 *str, struct codeset *codeset)
{
	char *dest;
	int len;
	if (!str) return NULL;
	len = strlen((char*)str);
	if ((dest = (char*)malloc(len+1)))
		utf8tostr(str,dest,len+1,codeset);
	return dest;
}

/*****************************************************************************/

int utf8tochar(const utf8 *str, unsigned int *chr, struct codeset *codeset)
{
	struct single_convert conv;
	struct single_convert *f;
	unsigned char c;
	int len = 0;

	if (!codeset) codeset = (struct codeset*)list_first(&codesets_list);
	if (!codeset) return 0;

	if ((c = *str++))
	{
		int i;

		len = trailingBytesForUTF8[c];
		conv.utf8[1] = c;

		for (i=0;i<len;i++)
		{
			if (!(conv.utf8[i+2] = *str++))
			{
				/* We encountered a 0 byte although the trailing byte suggested
				 * a different length. Hence the given utf8 sequence is not
				 * considered as valid */
				*chr = 0;
				return i+1;
			}
		}
		conv.utf8[2+len] = 0;

		if ((f = (struct single_convert*)bsearch(&conv,codeset->table_sorted,256,sizeof(codeset->table_sorted[0]),codesets_cmp_unicode)))
		{
			*chr = f->code;
		} else *chr = 0;
	} else *chr = 0;
	return len+1;
}

/*****************************************************************************/

static inline int utf8cmp_single(unsigned char *a, unsigned char *b)
{
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
	int d;

	if ((d = a[0] - b[0])) return d;
	if ((d = a[1] - b[1])) return d;
	if ((d = a[2] - b[2])) return d;
	if ((d = a[3] - b[3])) return d;
	return 0;
#else
	return (*((unsigned int *)a) - *((unsigned int *)b));
#endif
}

/*****************************************************************************/

int utf8tolower(const char *str, char *dest)
{
	unsigned char ch[4] = {0,0,0,0};
	unsigned char c;
	struct uniconv *uc;
	int bytes;
	int i;

	c = *str++;
	if (c<0x80)
	{
		*dest = tolower(c);
		return 1;
	}
	bytes = trailingBytesForUTF8[c];
	if (bytes > 3)
	{
		*dest++ = c;
		memcpy(dest + 1,str + 1,bytes);
		return bytes + 1;
	}

	ch[3-bytes] = c;
	for (i=bytes-1;i>=0;i--)
	{
		if (!(ch[3-i] = *str++))
			return 0;
	}

	BIN_SEARCH(utf8_tolower_table,0,ARRAY_LEN(utf8_tolower_table),utf8cmp_single(utf8_tolower_table[m].from, ch),uc);

	if (uc)
		memcpy(dest, uc->to + 3 - bytes, bytes + 1);
	else
		memcpy(dest, ch + 3 - bytes, bytes + 1);
	return bytes + 1;
}

/*****************************************************************************/

int utf8stricmp(const char *str1, const char *str2)
{
	unsigned char c1;
	unsigned char c2;

	if (!str1)
	{
		if (!str2) return 0;
		return -1;
	}

	if (!str2) return 1;

	while (1)
	{
		int d;
		char bytes1,bytes2;

		c1 = *str1++;
		c2 = *str2++;

		if (!c1)
		{
			if (!c2) return 0;
			return -1;
		}
		if (!c2) return 1;

		if (c1 < 0x80)
		{
			if (c2 < 0x80)
			{
				d = tolower(c1) - tolower(c2);
				if (d) return d;
				continue;
			} else
			{
				/* TODO: must use locale sensitive sorting */
				return -1;
			}
		}
		if (c2 < 0x80) return 1; /* TODO: must use locale sensitive sorting */

		bytes1 = trailingBytesForUTF8[c1];
		bytes2 = trailingBytesForUTF8[c2];

		/* case mapping only happens within same number of bytes (currently) */
		if ((d = bytes1 - bytes2)) return d;

		if (bytes1 > 3)
		{
			/* case mapping relevant characters are only withing 4 bytes */
			while (bytes1)
			{
				if ((d = *str1++ - *str2++)) return d;
				bytes1--;
			}
		} else
		{

			unsigned char ch1[4],ch2[4];
			struct uniconv *uc1;
			struct uniconv *uc2;
			int ch1l;
			int ch2l;

			*((unsigned int *)ch1) = 0;
			*((unsigned int *)ch2) = 0;

			ch1[3-bytes1] = c1;
			ch2[3-bytes1] = c2;

			while (bytes1)
			{
				bytes1--;
				ch1[3 - bytes1] = *str1++;
				ch2[3 - bytes1] = *str2++;
			}


			BIN_SEARCH(utf8_tolower_table,0,ARRAY_LEN(utf8_tolower_table),(*((unsigned int *)utf8_tolower_table[m].from) - *((unsigned int *)ch1)),uc1);
			BIN_SEARCH(utf8_tolower_table,0,ARRAY_LEN(utf8_tolower_table),(*((unsigned int *)utf8_tolower_table[m].from) - *((unsigned int *)ch2)),uc2);

			if (uc1) ch1l = *((unsigned int *)uc1->to);
			else ch1l = *((unsigned int *)ch1);

			if (uc2) ch2l = *((unsigned int *)uc2->to);
			else ch2l = *((unsigned int *)ch2);

			if (ch1l != ch2l)
			{
				if (ch1l < ch2l) return -1;
				return 1;
			}
		}
	}
	return 0;
}

/*****************************************************************************/

int utf8stricmp_len(const char *str1, const char *str2, int len)
{
	unsigned char c1;
	unsigned char c2;

	if (!str1)
	{
		if (!str2) return 0;
		return -1;
	}

	if (!str2) return 1;

	while (len>0)
	{
		int d;
		char bytes1,bytes2;

		c1 = *str1++;
		c2 = *str2++;
		len--;

		if (!c1)
		{
			if (!c2) return 0;
			return -1;
		}
		if (!c2) return 1;

		if (c1 < 0x80)
		{
			if (c2 < 0x80)
			{
				d = tolower(c1) - tolower(c2);
				if (d) return d;
				continue;
			} else
			{
				/* TODO: must use locale sensitive sorting */
				return -1;
			}
		}
		if (c2 < 0x80) return 1; /* TODO: must use locale sensitive sorting */

		bytes1 = trailingBytesForUTF8[c1];
		bytes2 = trailingBytesForUTF8[c2];

		/* case mapping only happens within same number of bytes (currently) */
		if ((d = bytes1 - bytes2)) return d;

		if (bytes1 > 3)
		{
			/* case mapping relevant characters are only withing 4 bytes */
			while (bytes1)
			{
				if ((d = *str1++ - *str2++)) return d;
				bytes1--;
			}
		} else
		{

			unsigned char ch1[4],ch2[4];
			struct uniconv *uc1;
			struct uniconv *uc2;
			int ch1l;
			int ch2l;

			*((unsigned int *)ch1) = 0;
			*((unsigned int *)ch2) = 0;

			ch1[3-bytes1] = c1;
			ch2[3-bytes1] = c2;

			while (bytes1)
			{
				bytes1--;
				ch1[3 - bytes1] = *str1++;
				ch2[3 - bytes1] = *str2++;

				len--;
			}


			BIN_SEARCH(utf8_tolower_table,0,ARRAY_LEN(utf8_tolower_table),(*((unsigned int *)utf8_tolower_table[m].from) - *((unsigned int *)ch1)),uc1);
			BIN_SEARCH(utf8_tolower_table,0,ARRAY_LEN(utf8_tolower_table),(*((unsigned int *)utf8_tolower_table[m].from) - *((unsigned int *)ch2)),uc2);

			if (uc1) ch1l = *((unsigned int *)uc1->to);
			else ch1l = *((unsigned int *)ch1);

			if (uc2) ch2l = *((unsigned int *)uc2->to);
			else ch2l = *((unsigned int *)ch2);

			if (ch1l != ch2l)
			{
				if (ch1l < ch2l) return -1;
				return 1;
			}
		}
	}
	return 0;
}

/*****************************************************************************/

int utf8match(const char *haystack, const char *needle, int case_insensitive, match_mask_t *match_mask)
{
	int h, n;
	int needle_len;
	int haystack_len;

	unsigned char hc;
	unsigned char nc;

	haystack_len = strlen(haystack);
	needle_len = strlen(needle);

	h = 0;
	n = 0;
	while (h < haystack_len && n < needle_len)
	{
		int match;
		int hbytes;
		int nbytes;

		match = 0;
		hc = haystack[h];
		nc = needle[n];

		hbytes = trailingBytesForUTF8[hc];
		nbytes = trailingBytesForUTF8[nc];

		if (hbytes == nbytes)
		{
			if (hc == nc)
			{
				int i;

				match = 1;

				for (i=0; i < hbytes; i++)
				{
					if (haystack[i+1] != needle[i+1])
						match = 0;
				}
			} else
			{
				if (hbytes == 0 && case_insensitive)
				{
					if (tolower(hc) == tolower(nc))
					{
						match = 1;
					}
				}
			}

			if (!match && case_insensitive && hbytes > 0)
			{
				char hchars[6] = {0};
				char nchars[6] = {0};
				int hl, nl;

				if ((hl = utf8tolower(&haystack[h], hchars)) > 0 &&
					(nl = utf8tolower(&needle[n], nchars)) > 0)
				{
					if (hl == nl)
					{
						match = memcmp(hchars, nchars, nl) == 0;
					}
				}

			}
		}

		if (match)
		{
			n += nbytes + 1;
		}

		if (match_mask)
		{
			unsigned int match_pos;

			match_pos = match_bitmask_pos(h);
			if (match)
			{
				match_mask[match_pos] |= match_bitmask(h);
			} else
			{
				match_mask[match_pos] &= ~match_bitmask(h);
			}
		}

		h += hbytes + 1;
	}

	if (n == needle_len)
	{
		if (match_mask)
		{
			/* Make sure that the remaining relevant positions are cleared */
			for (;h < haystack_len; h++)
			{
				match_mask[match_bitmask_pos(h)] &= ~match_bitmask(h);
			}
		}
		return 1;
	}
	return 0;
}

/*****************************************************************************/

char *utf8stristr(const char *str1, const char *str2)
{
	int str2_len;

	if (!str1 || !str2) return NULL;

	str2_len = strlen(str2);

	while (*str1)
	{
		if (!utf8stricmp_len(str1,str2,str2_len))
			return (char*)str1;
		str1++;
	}
	return NULL;
}

/*****************************************************************************/

const char *uft8toucs(const char *chr, unsigned int *code)
{
	unsigned char c = *chr++;
	unsigned int ucs = 0;
	int i,bytes;
	if (!(c & 0x80))
	{
		*code = c;
		return chr;
	} else
	{
		if (!(c & 0x20))
		{
			bytes = 2;
			ucs = c & 0x1f;
		}
		else if (!(c & 0x10))
		{
			bytes = 3;
			ucs = c & 0xf;
		}
		else if (!(c & 0x08))
		{
			bytes = 4;
			ucs = c & 0x7;
		}
		else if (!(c & 0x04))
		{
			bytes = 5;
			ucs = c & 0x3;
		}
		else /* if (!(c & 0x02)) */
		{
			bytes = 6;
			ucs = c & 0x1;
		}

		for (i=1;i<bytes;i++)
			ucs = (ucs << 6) | ((*chr++)&0x3f);
	}
	*code = ucs;
	return chr;
}

static unsigned char base64[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
static short invbase64[128];

static unsigned char ibase64[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,";
static short iinvbase64[128];

static unsigned char direct[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'(),-./:?";
static unsigned char optional[] = "!\"#$%&*;<=>@[]^_`{|}";
static unsigned char spaces[] = " \011\015\012";		/* space, tab, return, line feed */
static char mustshiftsafe[128];
static char mustshiftopt[128];

static int needtables = 1;

static void tabinit(void)
{
	int i, limit;

	for (i = 0; i < 128; ++i)
	{
		mustshiftopt[i] = mustshiftsafe[i] = 1;
		invbase64[i] = -1;
	}
	limit = strlen((char*)direct);
	for (i = 0; i < limit; ++i)
		mustshiftopt[direct[i]] = mustshiftsafe[direct[i]] = 0;
	limit = strlen((char*)spaces);
	for (i = 0; i < limit; ++i)
		mustshiftopt[spaces[i]] = mustshiftsafe[spaces[i]] = 0;
	limit = strlen((char*)optional);
	for (i = 0; i < limit; ++i)
		mustshiftopt[optional[i]] = 0;
	limit = strlen((char*)base64);
	for (i = 0; i < limit; ++i)
		invbase64[base64[i]] = i;

	/* that's for the modified imap utf7 stuff */
	limit = strlen((char*)ibase64);
	for (i = 0; i < limit; ++i)
		iinvbase64[ibase64[i]] = i;

	needtables = 0;
}

#if __cplusplus >= 201703L
#define DECLARE_BIT_BUFFER unsigned long BITbuffer = 0, buffertemp = 0; int bufferbits = 0
#else
#define DECLARE_BIT_BUFFER register unsigned long BITbuffer = 0, buffertemp = 0; int bufferbits = 0
#endif

#define BITS_IN_BUFFER bufferbits
#define WRITE_N_BITS(x, n) ((BITbuffer |= ( ((x) & ~(-1L<<(n))) << (32-(n)-bufferbits) ) ), bufferbits += (n) )
#define READ_N_BITS(n) ((buffertemp = (BITbuffer >> (32-(n)))), (BITbuffer <<= (n)), (bufferbits -= (n)), buffertemp)

/*****************************************************************************/

char *utf7ntoutf8(char *source, int sourcelen)
{
	FILE *fh;
	int base64value=0,base64EOF=0,first=0;
	int shifted = 0;
	char *dest = NULL;
	DECLARE_BIT_BUFFER;

	if (needtables) tabinit();

	if ((fh = tmpfile()))
	{
		int dest_len;

		while (sourcelen)
		{
			unsigned char c = *source++;
			sourcelen--;

			if (shifted)
			{
				if ((base64EOF = (!sourcelen) || (c > 0x7f) || (base64value = invbase64[c]) < 0))
				{
					shifted = 0;
					/* If the character causing us to drop out was SHIFT_IN or
					   SHIFT_OUT, it may be a special escape for SHIFT_IN. The
					   test for SHIFT_IN is not necessary, but allows an alternate
					   form of UTF-7 where SHIFT_IN is escaped by SHIFT_IN. This
					   only works for some values of SHIFT_IN.
					 */
					if (c && sourcelen && (c == '+' || c == '-'))
					{
						/* get another character c */
						unsigned char prevc = c;

						c = *source++;

						/* If no base64 characters were encountered, and the
							 character terminating the shift sequence was
							 SHIFT_OUT, then it's a special escape for SHIFT_IN.
						*/
						if (first && prevc == '-')
						{
							fputc('+',fh);
						}
					}
				} else
				{
					/* Add another 6 bits of base64 to the bit buffer. */
					WRITE_N_BITS(base64value, 6);
					first = 0;
				}
			}

			/* Extract as many full 16 bit characters as possible from the
			   bit buffer.
			 */
			while (BITS_IN_BUFFER >= 16)
			{
				UTF32 src_utf32 = READ_N_BITS(16);
				UTF32 *src_utf32_ptr = &src_utf32;
				UTF8 target_utf8[10];
				UTF8 *target_utf8_ptr = target_utf8;

				ConvertUTF32toUTF8(&src_utf32_ptr,src_utf32_ptr+1,&target_utf8_ptr,target_utf8+10, strictConversion);

				fwrite(target_utf8,1,target_utf8_ptr - target_utf8,fh);
			}

			if (!c) break;

			if (base64EOF) BITS_IN_BUFFER = 0;

			if (!shifted)
			{
				if (c == '+')
				{
					shifted = first = 1;
				} else
				{
					if (c <= 0x7f)
					{
						fputc(c,fh);
					} /* else the source is invalid, so we ignore this */
				}
			}
		}

		if ((dest_len = ftell(fh)))
		{
	    fseek(fh,0,SEEK_SET);
			if ((dest = (char*)malloc(dest_len+1)))
			{
				fread(dest,1,dest_len,fh);
				dest[dest_len]=0;
			}
		}
	}

	return dest;
}

/*****************************************************************************/

char *utf8toiutf7(char *utf8, int sourcelen)
{
	FILE *fh;
	char *dest = NULL;

	if (needtables) tabinit();

	if ((fh = tmpfile()))
	{
		int dest_len;
		int shifted = 0;
		DECLARE_BIT_BUFFER;

		while (1)
		{
			unsigned char c;
			int noshift;

			if (sourcelen)
			{
				c = *utf8;
				noshift = (c >= 0x20 && c <= 0x7e) && (c != '&');
			} else
			{
				c = 0;
				noshift = 1;
			}

			if (shifted)
			{
				while (BITS_IN_BUFFER >= 6)
				{
					unsigned char bits = READ_N_BITS(6);
					fputc(ibase64[bits],fh);
				}

				if (noshift)
				{
					int bits_in_buf = BITS_IN_BUFFER;

					if (bits_in_buf)
					{
						unsigned char bits = READ_N_BITS(bits_in_buf);
						bits <<= 6 - bits_in_buf;
						fputc(ibase64[bits],fh);
					}
					shifted = 0;
					fputc('-',fh);
				}
			}

			if (!c) break;

			if (noshift)
			{
				if (c == '&')
				{
					fputs("&-",fh);
				} else fputc(c,fh);
				utf8++;
				sourcelen--;
			} else
			{
				UTF8 *source = (UTF8*)utf8;
				UTF16 dest = 0;
				UTF16 *dest_ptr = &dest;
				ConversionResult res;

				res = ConvertUTF8toUTF16(&source, source + sourcelen, &dest_ptr, dest_ptr + 1, strictConversion);
				if (res == conversionOK || res == targetExhausted)
				{
					sourcelen -= trailingBytesForUTF8[c] + 1;
					utf8 += trailingBytesForUTF8[c] + 1;

					if (!shifted)
					{
						fputc('&',fh);
						shifted = 1;
					}

					WRITE_N_BITS(dest,16);
				}
			}
		}

		if ((dest_len = ftell(fh)))
		{
	    fseek(fh,0,SEEK_SET);
			if ((dest = (char*)malloc(dest_len+1)))
			{
				fread(dest,1,dest_len,fh);
				dest[dest_len]=0;
			}
		}
	}
	return dest;
}

/*****************************************************************************/

char *iutf7ntoutf8(char *source, int sourcelen)
{
	FILE *fh;
	int base64value=0,base64EOF=0,first=0;
	int shifted = 0;
	char *dest = NULL;
	DECLARE_BIT_BUFFER;

	if (needtables) tabinit();

	if ((fh = tmpfile()))
	{
		int dest_len;

		while (sourcelen)
		{
			unsigned char c = *source++;
			sourcelen--;

			if (shifted)
			{
				if ((base64EOF = (!sourcelen) || (c > 0x7f) || (base64value = invbase64[c]) < 0))
				{
					shifted = 0;
					/* If the character causing us to drop out was SHIFT_IN or
					   SHIFT_OUT, it may be a special escape for SHIFT_IN. The
					   test for SHIFT_IN is not necessary, but allows an alternate
					   form of UTF-7 where SHIFT_IN is escaped by SHIFT_IN. This
					   only works for some values of SHIFT_IN.
					 */
					if (c && sourcelen && (c == '-'))
					{
						/* get another character c */
						unsigned char prevc = c;

						c = *source++;

						/* If no base64 characters were encountered, and the
							 character terminating the shift sequence was
							 SHIFT_OUT, then it's a special escape for SHIFT_IN.
						*/
						if (first && prevc == '-')
						{
							fputc('&',fh);
						}
					}
				} else
				{
					/* Add another 6 bits of base64 to the bit buffer. */
					WRITE_N_BITS(base64value, 6);
					first = 0;
				}
			}

			/* Extract as many full 16 bit characters as possible from the
			   bit buffer.
			 */
			while (BITS_IN_BUFFER >= 16)
			{
				UTF32 src_utf32 = READ_N_BITS(16);
				UTF32 *src_utf32_ptr = &src_utf32;
				UTF8 target_utf8[10];
				UTF8 *target_utf8_ptr = target_utf8;

				ConvertUTF32toUTF8(&src_utf32_ptr,src_utf32_ptr+1,&target_utf8_ptr,target_utf8+10, strictConversion);

				fwrite(target_utf8,1,target_utf8_ptr - target_utf8,fh);
			}

			if (!c) break;

			if (base64EOF) BITS_IN_BUFFER = 0;

			if (!shifted)
			{
				if (c == '&')
				{
					shifted = first = 1;
				} else
				{
					if (c <= 0x7f)
					{
						fputc(c,fh);
					} /* else the source is invalid, so we ignore this */
				}
			}
		}

		if ((dest_len = ftell(fh)))
		{
	    fseek(fh,0,SEEK_SET);
			if ((dest = (char*)malloc(dest_len+1)))
			{
				fread(dest,1,dest_len,fh);
				dest[dest_len]=0;
			}
		}
	}

	return dest;
}

/*****************************************************************************/

char *utf8topunycode(const utf8 *source, int sourcelen)
{
	enum punycode_status status;
	const utf8 *sourceend;
	char *puny;
	punycode_uint puny_len;

	punycode_uint *dest, *target;
	punycode_uint dest_len;

	if (!(dest = (punycode_uint *)malloc(sourcelen * sizeof(punycode_uint))))
		return NULL;

	target = dest;
	sourceend = source + sourcelen;

	while (source < sourceend)
	{
		punycode_uint ch = 0;
		unsigned short extraBytesToRead = trailingBytesForUTF8[*(UTF8*)source];

		if (source + extraBytesToRead >= sourceend)
		{
			/* source exhausted */
			free(dest);
			return NULL;
		}

		/* Do this check whether lenient or strict */
		if (!isLegalUTF8((UTF8*)source, extraBytesToRead+1))
		{
			free(dest);
			return NULL;
		}

		/*
		 * The cases all fall through.
		 */
		switch (extraBytesToRead) {
			case 3:	ch += *source++; ch <<= 6;
			case 2:	ch += *source++; ch <<= 6;
			case 1:	ch += *source++; ch <<= 6;
			case 0:	ch += *source++;
		}
		ch -= offsetsFromUTF8[extraBytesToRead];

		if (ch <= UNI_MAX_UTF32) {
			*target++ = ch;
		} else if (ch > UNI_MAX_UTF32) {
			*target++ = UNI_REPLACEMENT_CHAR;
		}
	}

	dest_len = target - dest; /* No 0 ending */
	puny_len = dest_len * 2;

	do
	{
		int strored_puny_len = puny_len;

		if (!(puny = (char*)malloc(puny_len+5)))
		{
			free(dest);
			return NULL;
		}
		status = punycode_encode(dest_len, dest, NULL /* case flags */, &puny_len, puny);

		if (status == punycode_success)
		{
			puny[puny_len] = 0;
			free(dest);
			return puny;
		}
		puny_len = strored_puny_len * 2;
	} while (status == punycode_big_output);

	free(puny);
	free(dest);
	return NULL;
}
/*****************************************************************************/

utf8 *punycodetoutf8(const char *source, int sourcelen)
{
	enum punycode_status status;
	punycode_uint *utf32;
	punycode_uint length;

	length = sourcelen;

	if (!(utf32 = (punycode_uint*)malloc(sizeof(punycode_uint)*sourcelen)))
		return NULL;

	status = punycode_decode(sourcelen, source, &length, utf32, NULL);
	if (status == punycode_success)
	{
		utf8 *dest = (utf8*)malloc(sourcelen * 4);
		if (dest)
		{
			UTF8 *dest_start = (UTF8*)dest;
			UTF32 *source_start = (UTF32*)utf32;

			ConvertUTF32toUTF8((UTF32**)&source_start, (UTF32*)(utf32) + length, &dest_start, dest_start + sourcelen * 4 - 2, strictConversion);
			*dest_start = 0;
			free(utf32);
			return dest;
		}
	}
	free(utf32);
	return NULL;
}

/*****************************************************************************/

int isascii7(const char *str)
{
	char c;
	if (!str) return 1;
	while ((c = *str++))
	{
		if (c & 0x80) return 0;
	}
	return 1;
}

YAM



AmigaGPT



From Wookiechat

Charsets: wookiechat doesnt need the incoming charset to be configured exactly anymore. When someone types weird characters, wookie will scan it for utf8 characters.. if it has those, then it'll convert it to ascii as best an Amiga can using codesets.library. if theres none, then it'll just use codesets.library Codesets_FindBest() function.




Library Calls

TABLE OF CONTENTS

codesets.library/codesets.library
codesets.library/CodesetsSupportedA
codesets.library/CodesetsFindA
codesets.library/CodesetsFindBestA
codesets.library/CodesetsConvertStrA
codesets.library/CodesetsFreeA
codesets.library/CodesetsFreeVecPooledA
codesets.library/CodesetsSetDefaultA
codesets.library/CodesetsListCreateA
codesets.library/CodesetsListDeleteA
codesets.library/CodesetsListAddA
codesets.library/CodesetsListRemoveA
codesets.library/CodesetsUTF8CreateA
codesets.library/CodesetsUTF8ToStrA
codesets.library/CodesetsUTF8Len
codesets.library/CodesetsIsValidUTF8
codesets.library/CodesetsIsLegalUTF8
codesets.library/CodesetsIsLegalUTF8Sequence
codesets.library/CodesetsStrLenA
codesets.library/CodesetsConvertUTF16toUTF32
codesets.library/CodesetsConvertUTF16toUTF8
codesets.library/CodesetsConvertUTF32toUTF16
codesets.library/CodesetsConvertUTF32toUTF8
codesets.library/CodesetsConvertUTF8toUTF16
codesets.library/CodesetsConvertUTF8toUTF32
codesets.library/CodesetsDecodeB64A
codesets.library/CodesetsEncodeB64A


codesets.library/codesets.library

    *******************************************************************
    Copyright (c) 2005-2008 by codesets.library Open Source Team
    $Id$
    $URL$

    codesets.library is an AmigaOS shared library which provides
    functions to deal with different kind of codesets. It provides
    general character conversion routines, e.g. for converting
    from one charset (e.g. UTF8) into another (e.g. ISO-8859-1) or
    vice versa.

    codesets.library is mainly based on some code from UNICODE, some
    code from the SimpleMail project as well as some additions done
    by the codesets.library Open Source Team.

    It is released and distributed under the terms of the GNU Lesser
    General Public License (LGPL) and available free of charge.

    Please visit http://www.sf.net/projects/codesetslib/ for
    the very latest version and information regarding codesets.library.
    *******************************************************************

    For some short introduction on how to use codesets.library, the
    following pharagraph should provide a good summary. What you
    usually want to do with codesets.library is, to convert strings from
    one so-called "Source Codeset" into another "Destination Codeset".
    The following list are only the main functions provided to
    developers, wanting to achieve this conversion in their applications:






    CodesetsSupportedA()
    --------------------

      For querying codesets library which codesets/charsets it supports
      either by its internal available charsets or by having obtained
      them from the operating system (e.g. AmigaOS4), this function
      can be used.

      E.g. in a MUI application you would do something like:

      -- cut here --
      STRPTR *array;

      if((array = CodesetsSupportedA(NULL)))
      {
        DoMethod(list, MUIM_List_Insert, array, -1, MUIV_List_Insert_Sorted);
        CodesetsFreeA(array, NULL);
      }
      -- cut here --



    CodesetsFindA()
    ---------------

      For processing/converting a specific string, you normally have to
      specify in which codeset this string has to be intepreted. For this
      purpose you have to pass a so-called "Source Codeset" to the main
      function of codesets.library. With the "CodesetsFindA()" function you
      can query codesets.library for providing you a pointer to the
      corresponding codeset structure which you afterwards will forward to
      the main conversion routines later on.

      For receiving the pointer to the Amiga-1251 codeset:
      -- cut here --
      struct codeset *cs;

      if((cs = CodesetsFind("Amiga-1251",
                            CSA_FallbackToDefault, FALSE,
                            TAG_DONE)))
      {
        ...
      }
      -- cut here --

      For querying codesets.library for the currently used system wide
      default of your running operating system:
      -- cut here --
      struct codeset *default;

      if((default = CodesetsFindA(NULL, NULL)))
      {
        ...
      }
      -- cut here --



    CodesetsConvertStrA()
    ---------------------

      The more or less most common function to use in codesets.library is
      definitly this function. It allows to convert a string from
      one "Source Codeset" to another "Destination Codeset". It takes
      the source string converts it internally into UTF8 if necessary and
      then directly convert the UTF8 to the specified destination codeset.

      To convert a string 'str' to a destination codeset:
      -- cut here --
      STRPTR destString;

      if((destString = CodesetsConvertStr(CSA_SourceCodeset, srcCodeset,
                                          CSA_DestCodeset,   destCodeset,
                                          CSA_Source,        str,
                                          TAG_DONE)))
      {
        ....

        CodesetsFreeA(destString, NULL);
      }
      -- cut here --

   Even if the above functions should cover most of the common functionality
   an ordinary user of codesets.library would require, it supplies a lot more
   functions which in fact we will not go into detail here but present
   certain examples in the respective documentation section of each function.

   However, if you find the documentation is still too limited or you feel
   some major functionality is missing regarding dealing with codesets,
   please let us know so that we or even you can improve it.


   Your codesets.library Open Source Team.
   February 2006


codesets.library/CodesetsSupportedA

   NAME
    CodesetsSupportedA - returns names of supported codesets

   SYNOPSIS
    array = CodesetsSupportedA(attrs);
                               A0

    STRPTR * CodesetsSupportedA(struct TagItem *);

    array = CodesetsSupported(tag1, ...);
                              A0

    STRPTR * CodesetsSupported(Tag, ...);

   FUNCTION
    Returns a NULL terminated array of the supported codeset
    names. The array _must_ be freed with CodesetsFreeA().

   INPUTS
    attrs - a list of additional tag items. Valid items are:

      CSA_CodesetList (struct codesetList *)
        You may supply an unlimited number of additional
        codeset lists which you have previously allocated/loaded
        with CodesetsListCreateA(). Otherwise just the internal
        list of available codesets will be searched.
        Default: NONE

     CSA_AllowMultibyteCodesets (BOOL)
       Include multibyte codesets (UTF8, UTF16, UTF32) in the
       generated names array.
       Default: TRUE

   RESULT
    array - the names array or NULL on an error.

   EXAMPLE
    For printing out all supported codeset names:

    -- cut here --
    STRPTR *array;

    if((array = CodesetsSupportedA(NULL)))
    {
      int i;

      for(i=0; array[i] != NULL; i++)
        printf("%s", array[i]);

      CodesetsFreeA(array, NULL);
    }
    -- cut here --

   SEE ALSO
    codesets.library/CodesetsListCreateA


codesets.library/CodesetsFindA

   NAME
    CodesetsFindA - finds a codeset

   SYNOPSIS
    codeset = CodesetsFindA(name, attrs);
    D0                      A0    A1

    struct codeset * CodesetsFindA(STRPTR, struct TagItem *);

    codeset = CodesetsFind(name, tag1, ...);
    D0                     A0    A1

    struct codeset * CodesetsFind(STRPTR, Tag, ...);

   FUNCTION
    Finds and returns a codeset by its name. The data behind the
    pointer should be considered read-only and must not be altered
    in any way.

   INPUTS
    name - the codeset name (or alias) to find
    attrs - a list of additional tag items. Valid items are:

      CSA_FallbackToDefault (BOOL)
        If TRUE the function never fails and returns the default
        codeset if the supplied codeset name can't be found.
        Default: TRUE

      CSA_CodesetList (struct codesetList *)
        You may supply an unlimited number of additional
        codeset lists which you have previously allocated/loaded
        with CodesetsListCreateA(). Otherwise just the internal
        list of available codesets will be searched.
        Default: NONE

   RESULT
    codeset - the codeset or NULL on an error

   EXAMPLE
     E.g. for receiving the pointer to the Amiga-1251 codeset:

     -- cut here --
     struct codeset *cs;

     if((cs = CodesetsFind("Amiga-1251",
                           CSA_FallbackToDefault, FALSE,
                           TAG_DONE)))
     {
       ...
     }
     -- cut here --

     For querying codesets.library for the currently used system
     wide default of your running operating system:
     -- cut here --
     struct codeset *default;

     if((default = CodesetsFindA(NULL, NULL)))
     {
       ...
     }
     -- cut here --

   NOTE
    Please note for querying the system's default codeset the
    method of finding this codeset is highly dependent on the way
    the operating system can be queried for it. E.g. on AmigaOS4
    the default codeset is queried with updated system functions,
    but for AmigaOS3 a static list of language<>codeset mappings
    is used.

   SEE ALSO
    codesets.library/CodesetsListCreateA



codesets.library/CodesetsFindBestA

   NAME
    CodesetsFindBestA - finds the best codeset matching a
                        string content.

   SYNOPSIS
    codeset = CodesetsFindBestA(attrs);
    D0                          A0

    struct codeset * CodesetsFindBestA(struct TagItem *);

    codeset = CodesetsFindBest(tag1, ...);
    D0                         A0

    struct codeset * CodesetsFindBest(Tag, ...);

   FUNCTION
    Returns the best found codeset for the given text in the supplied
    codeset family. In case no proper codeset for the supplied source string
    could be found, NULL is returned or the default codeset if the
    CSA_FallbackToDefault attribute is set to TRUE. In addition, in case
    the CSA_ErrPtr is given, the amount of failed identifications (chars)
    are returned.

   INPUTS
    attrs - a list of tag items. Valid items are:

      CSA_Source (STRPTR)
        The string which you want to convert. Must be supplied,
        otherwise the functions returns NULL.

      CSA_SourceLen (ULONG)
        Length of CSA_Source or less to check just a part
        Default: string length of CSA_Source

      CSA_ErrPtr (int *)
        Pointer to an integer variable which will be filled with the
        number of found errors (not identifyable chars)
        Default: NULL

      CSA_CodesetList (struct codesetList *)
        You may supply an unlimited number of additional
        codeset lists which you have previously allocated/loaded
        with CodesetsListCreateA(). Otherwise just the internal
        list of available codesets will be searched.
        Default: NONE

      CSA_CodesetFamily (ULONG)
        To narrow the analyze, a user might define the codeset family
        of which the supplied text might be composed of. The reason for
        this is, that there isn't a unique identification algorithm
        which can tell the codeset out of a given text. So to narrow
        the identification, the follow values might be specified:

          CSV_CodesetFamily_Latin    - Latin codeset family (e.g. ISO-8859-X)
          CSV_CodesetFamily_Cyrillic - Cyrillic codeset family (e.g. KOI8R)

        Default: CSV_CodesetFamily_Latin

      CSA_FallbackToDefault (BOOL)
        If TRUE the function never fails and returns the default
        codeset if the supplied text couldn't be identified
        Default: FALSE

   RESULT
    codeset - the best matching codeset or NULL in case a NULL pointer
              was supplied as the source string.

   EXAMPLE
     E.g. for receiving the pointer to 'best matching' codeset matching
     a KOI8-R string:

     -- cut here --
     struct codeset *cs;
     char str[] = "îÅ×ÏÚÍÏÖÎÏ ÐÅÒÅËÏÄÉÒÏ×ÁÔØ ÉÚ ËÏÄÉÒÏ×ËÉ";
     int errPtr;

     if((cs = CodesetsFindBest(CSA_Source, str,
                               CSA_ErrPtr, &errPtr,
                               CSA_CodesetFamily, CSV_CodesetFamily_Cyrillic,
                               CSA_FallBackToDefault, FALSE,
                               TAG_DONE)))
     {
       ... should return the KOI8-R codeset ...
     }
     -- cut here --

   SEE ALSO
    codesets.library/CodesetsListCreateA



codesets.library/CodesetsConvertStrA

   NAME
    CodesetsConvertStrA - converts a string from one source codeset to
                          another destination codeset.

   SYNOPSIS
    dest = CodesetsConvertStrA(attrs)
    D0                         A0

    STRPTR CodesetsConvertStrA(struct TagItem *);

    dest = CodesetsConvertStr(tag1, ...);
    D0                        A0

    STRPTR CodesetsConvertStr(Tag, ...);

   FUNCTION
    The function takes source string which is encoded in a so-called
    'Source codeset' and converts it immediately into an equivalent
    string which will be encoded in the corresponding 'Destination Codeset'.

   INPUTS
    attrs - a list of mandatory tag items. Valid items are:

      CSA_Source (STRPTR)
        The string which you want to convert. Must be supplied,
        otherwise the functions returns NULL.

      CSA_SourceLen (ULONG)
        Length of CSA_Source or less to convert just a part
        Default: string length of CSA_Source

      CSA_SourceCodeset (struct codeset *)
        The codeset in which the source string is encoded.
        Default: the system's default codeset

      CSA_DestCodeset (struct codeset *)
        The codeset to which the source string should be converted to.
        Default: the system's default codeset

      CSA_DestLenPtr (ULONG *)
        If supplied, will contain the length of the converted string
        which is returned.

      CSA_MapForeignChars (BOOL)
        If a character of the source string cannot be directly mapped
        to the destination codeset a "?" character will normally be used
        to signal this case. If this attribute is set, an internal
        replacement table will be used which tries to replace these
        "foreign" characters by "looklike" ASCII character sequences.
        Please note, that this functionality is mostly just usable by
        Latin users due to the straight mapping to ASCII (7bit).
        Default: FALSE

      CSA_MapForeignCharsHook (struct Hook *)
        If a character of the source string cannot be directly mapped
        to the destination codeset a "?" character will normally be used
        to signal this case. By using this attribute, a hook can be
        supplied which is called for every such foreign character.
        Within this hook the UTF8 sequence is supplied which cannot be
        directly mapped to the destination codeset. During the execution
        of the hook a replacement string might be specified, which in turn
        will be used by the internals of codesets.library to map this
        "foreign" char to a difference character or UTF8 sequence.

        If both, CSA_MapForeignChars and CSA_MapForeignCharsHook, are
        specified the hook will only be executed in case the internal
        routines don't supply an own mapping for the foreign UTF8 sequence.

        The hook function should be declared as:

        ULONG ASM SAVEDS fun(REG(a0, struct Hook *hook),
                             REG(a2, struct replaceMsg *msg),
                             REG(a1, void *dummy))

        struct Hook *hook
          Your hook

        msg->dst
          place your desired replacement string here

        msg->src
          the UTF8 sequence to be replaced, this string is READ-ONLY!

        msg->srclen
          the length of the UTF8 sequence to be replaced, do NOT peek
          beyond this limit.

        The return value of this hook function is the length of the replacement
        string. Return zero if no replacement did happen. Positive values will
        be treated as lengths of ASCII strings. Negative values signals a
        replacement by another UTF8 sequence. Please note, that in case you
        supply a UTF8 sequence as a replacement for the "foreign" UTF8, your
        hook might be called again if this sequence can still not be mapped to
        the destination codesets, thus is again a "foreign" sequence.


   RESULT
    either a pointer to the generated destination string or NULL
    on a found error.

   EXAMPLE
    To convert an ISO-8859-1 encoded string 'src' into an Amiga-1251
    equivalent 'dst' string:
    -- cut here --
    STRPTR src, dst;
    struct codeset *srcCodeset, *dstCodeset;

    srcCodeset = CodesetsFindA("ISO-8859-1", NULL);
    dstCodeset = CodesetsFindA("Amiga-1251", NULL);

    if((dst = CodesetsConvertStr(CSA_SourceCodeset, srcCodeset,
                                 CSA_DestCodeset,   dstCodeset,
                                 CSA_Source,        src,
                                 TAG_DONE)))
    {
      ....

      CodesetsFreeA(dst, NULL);
    }
    -- cut here --

   SEE ALSO
    codesets.library/CodesetsFreeA



codesets.library/CodesetsFreeA

   NAME
    CodesetsFreeA - frees objects previously internally allocated
                    by codesets.library

   SYNOPSIS
    CodesetsFreeA(obj, attrs)
                  A0   A1
    void CodesetsFreeA(APTR, struct TagItem *);

    CodesetsFree(obj, tag1, ...);
                 A0   A1
    void CodesetsFree(APTR, Tag, ...);

   FUNCTION
    Frees object previously allocated by codesets.library. E.g. using
    functions like CodesetsSupportedA() or CodesetsConvertStrA().

   INPUTS
    obj - the object to free
    attrs - a list of additional tag items. Currently non items.

   RESULT
    no result

   EXAMPLE

    -- cut here --
    STRPTR *array;

    if((array = CodesetsSupportedA(NULL)))
    {
      ...

      CodesetsFreeA(array, NULL);
    }
    -- cut here --

   SEE ALSO
    codesets.library/CodesetsSupportedA
    codesets.library/CodesetsConvertStrA



codesets.library/CodesetsFreeVecPooledA

   NAME
    CodesetsFreeVecPooledA - frees objects previously allocated
                             by methods supporting CSA_Pool

   SYNOPSIS
    CodesetsFreeVecPooledA(pool, obj, attrs)
                           A0    A1   A2
    void CodesetsFreeVecPooledA(APTR, APTR, struct TagItem *);

    CodesetsFreeVecPooled(pool, obj, tag1, ...);
                          A0    A1   A2
    void CodesetsFreeVecPooled(APTR, APTR, Tag, ...);

   FUNCTION
    Frees object previously allocated by codesets.library via a
    private memory pool which was previously used on codesets
    functions via the CSA_Pool tag.

   INPUTS
    pool - pointer to the private memory pool
    obj - the object to free
    attrs - a list of additional tag items. Valid tags are:

      CSA_PoolSem (struct SignalSemaphore *)
        A semaphore to lock when using CSA_Pool

   RESULT
    no result

   EXAMPLE

    -- cut here --
    UTF8   *utf8;
    STRPTR str;
    APTR   pool;

    if((utf8 = CodesetsUTF8Create(CSA_Source,     str,
                                  CSA_Pool,       pool,
                                  TAG_DONE)))
    {
        ...

        CodesetsFreeVecPooledA(pool,utf8,NULL);
    }
    -- cut here --

   SEE ALSO
    codesets.library/CodesetsUTF8CreateA
    codesets.library/CodesetsUTF8ToStrA



codesets.library/CodesetsSetDefaultA

   NAME
    CodesetsSetDefaultA - sets the default codeset, overwriting
                          the system default if necessary.

   SYNOPSIS
    codeset = CodesetsSetDefaultA(name, attrs);
                                  A0    A1

    struct codeset * CodesetsSetDefaultA(STRPTR, struct TagItem *);

    codeset = CodesetsSetDefault(name, tag1, ...);
                                 A0    A1

    struct codeset * CodesetsSetDefault(STRPTR, Tag, ...);

   FUNCTION
    Sets the default codeset to name. The codeset will be stored in
    the environment variable 'codeset_default'.

   INPUTS
    name - the name of the codeset to set as default
    attrs - a list of additional tag items. Valid items are:

      CSA_Save (BOOL)
        If TRUE the codeset will be permanently saved and survives
        a reset. Otherwise the default setting will just last until
        the next reboot.
        Default: FALSE

   RESULT
    codeset - the codeset or NULL

   NOTE
    In case the operating system supports the direct query of the
    currently active system's default codeset, this function will
    still overwrite this setting. So by using this method a user may
    overwrite all system's setting and set a global default codeset
    for his machine no matter what the OS suggests. However, in case
    your operating sytsem perfectly supports the querying of the
    system's default codeset (e.g. AmigaOS4) you are adviced to use
    this function with care - or even avoid to use it at all.

   SEE ALSO
    codesets.library/CodesetsFindA



codesets.library/CodesetsListCreateA

   NAME
    CodesetsListCreateA - creates a private, task-wise codeset list
                          and returns it to the user for further reference.

   SYNOPSIS
    list = CodesetsListCreateA(attrs);
    D0                         A0

    struct codesetList * CodesetsListCreateA(struct TagItem *);

    list = CodesetsListCreate(tag1, ...);
    D0                        A0

    struct codesetList * CodesetsListCreateA(Tag, ...);

   FUNCTION
    This function allows to create a private, task-wise codeset list by
    loading charset files from either a whole directory tree, a specific
    charset file or even by using an exsiting codeset structure.
    By using this function, an application might load and carry its very
    own private charsets in parallel to the internal charsets of
    codeset.library. This way each application can provide a different
    codeset list to the user without having to load and manage these
    lists on their own.

   INPUTS
    attrs - a list of addtional tag items. Valid items are:

      CSA_CodesetDir (STRPTR)
        The path to a whole directory which codesets library will
        walk through for searching for proper charset files.
        Default: NULL

      CSA_CodesetFile (STRPTR)
        The path to a specific file which codesets.library will try
        to load as a standard charset translation file.
        Default: NULL

      CSA_SourceCodeset (struct codeset *)
        The pointer to an already existing codeset structure which
        will immediately be added to the created list. Please be
        carefull to add one codeset to multiple lists, especially
        when you do a CodesetsListDelete() to free the list.
        Default: NULL

   RESULT
    list - the private codeset list or NULL on an error condition

   NOTE
    For convienence, if no tag item attribute at all is supplied to the
    function, codesets.library will try to load charsets from the
    corresponding "PROGDIR:Charsets" directoy and add found codeset to
    the list. However, in case a tag item is specified (no matter what
    kind) the PROGDIR: scanning will be omitted.

   EXAMPLE
    For loading all found charset files from PROGDIR:Charsets:

    -- cut here --
    struct codesetList *csList;

    if((csList = CodesetsListCreateA(NULL)))
    {
      STRPTR codesetArray = CodesetsSupported(CSA_CodesetList, csList,
                                              TAG_DONE);

      // codesetsArray should now also carry our private
      // codesets from PROGDIR:Charsets
      ...

      CodesetsListDeleteA(CSA_CodesetList, csList,
                          TAG_DONE);
    }
    -- cut here --

   SEE ALSO
    codesets.library/CodesetsListDeleteA
    codesets.library/CodesetsListAddA
    codesets.library/CodesetsListRemoveA
    codesets.library/CodesetsListSupportedA
    codesets.library/CodesetsListFindA
    codesets.library/CodesetsListFindBestA



codesets.library/CodesetsListDeleteA

   NAME
    CodesetsListDeleteA - deletes/frees all resources of previously created
                          private codeset lists.

   SYNOPSIS
    result = CodesetsListDeleteA(attrs);
    D0                           A0

    BOOL CodesetsListDeleteA(struct TagItem *);

    result = CodesetsListDelete(tag1, ...);
    D0                          A0

    BOOL CodesetsListDelete(Tag, ...);

   FUNCTION
    This function deletes all resources (also the contained codeset
    structures per default) and frees the memory of previously allocated
    private codeset lists.

   INPUTS
    attrs - a list of mandatory tag items. Valid items are:

      CSA_CodesetList (struct codesetList *)
        Pointer to a previously created, private codeset list whos
        resources should be freed.
        Default: NULL

      CSA_FreeCodesets (BOOL)
        If TRUE, all contained codesets should also be freed/deleted,
        otherwise just frees the list object itself.
        Default: TRUE

   RESULT
    result - TRUE on success otherwise FALSE

   NOTE
    Please note that if you added an explicit codeset structure to more
    than two private codeset lists you may run into problems with you
    don't take care of this yourself. This is a dumb function which just
    walks through the list and frees all resources. Set CSA_FreeCodesets
    to FALSE in case you just want to free the list object.

   SEE ALSO
    codesets.library/CodesetsListCreateA
    codesets.library/CodesetsListAddA
    codesets.library/CodesetsListRemoveA



codesets.library/CodesetsListAddA

   NAME
    CodesetsListAddA - allows to add additional codesets to an already
                       existing private codeset list previously created with
                       CodesetsListCreateA().

   SYNOPSIS
    result = CodesetsListAddA(attrs);
    D0                        A0

    BOOL CodesetsListAddA(struct TagItem *);

    result = CodesetsListAdd(tag1, ...);
    D0                       A0

    BOOL CodesetsListAdd(Tag, ...);

   FUNCTION
    This function allows to add additional codesets to an already existing
    private codeset list. Either codesets themself may be added directly, or
    the path to either a file or a directory may be specified from which
    additional codesets may be loaded from known charset files.

   INPUTS
    attrs - a list of mandatory tag items. Valid items are:

      CSA_CodesetDir (STRPTR)
        The path to a whole directory which codesets library will
        walk through for searching for proper charset files.
        Default: NULL

      CSA_CodesetFile (STRPTR)
        The path to a specific file which codesets.library will try
        to load as a standard charset translation file.
        Default: NULL

      CSA_SourceCodeset (struct codeset *)
        The pointer to an already existing codeset structure which
        will immediately be added to the created list. Please be
        carefull to add one codeset to multiple lists, especially
        when you do a CodesetsListDelete() to free the list.
        Default: NULL

   RESULT
    result - TRUE on success otherwise FALSE

   NOTE
    Be careful when adding one codeset to more than one codeset list as
    you may run into problems when freeing the list afterwards.

   SEE ALSO
    codesets.library/CodesetsListCreateA
    codesets.library/CodesetsListDeleteA
    codesets.library/CodesetsListAddA



codesets.library/CodesetsListRemoveA

   NAME
    CodesetsListRemoveA - removes a single or multiple codesets from a
                          previously created codeset list.

   SYNOPSIS
    result = CodesetsListRemoveA(attrs);
    D0                           A0

    BOOL CodesetsListRemoveA(struct TagItem *);

    result = CodesetsListRemove(tag1, ...);
    D0                          A0

    BOOL CodesetsListRemove(Tag, ...);

   FUNCTION
    This function allows to remove single or multiple codesets from a
    previously created codeset list. The removed codeset structures will
    also be freed/deleted per default.

   INPUTS
    attrs - a list of mandatory tag items. Valid items are:

      CSA_SourceCodeset (struct codeset *)
        Pointer to a codeset structure which should be removed from
        its corresponding list. Per default its resources will also
        be internally freed.
        Default: NULL

      CSA_FreeCodesets (BOOL)
        If TRUE, all supplied codesets should also be freed/deleted,
        otherwise the codesets will just be removed from their lists.
        Default: TRUE

   RESULT
    result - TRUE on success otherwise FALSE

   NOTE
    The function will automatically prevent removal of codesets from the
    internal codeset list of codesets.library and will return FALSE in
    case a user tried to remove a codeset from the internal list.

   SEE ALSO
    codesets.library/CodesetsListDeleteA
    codesets.library/CodesetsListAddA



codesets.library/CodesetsUTF8CreateA

   NAME
    CodesetsUTF8CreateA - creates an UTF8 compliant string
                          interpretation out of a supplied source
                          string.


   SYNOPSIS
    utf8 = CodesetsUTF8CreateA(attrs);
                               A0
    UTF8 * CodesetsUTF8CreateA(struct TagItem *);

    utf8 = CodesetsUTF8Create(tag1, ...);
                              A0
    UTF8 * CodesetsUTF8Create(Tag, ...);


   FUNCTION
    Creates an UTF8 from a string which is encoded in specified
    codeset.

   INPUTS
    attrs - a list of mandatory tag items. Valid items are:

      CSA_Source (STRPTR)
        The string which you want to convert. Must be supplied,
        otherwise the functions returns NULL.

      CSA_SourceLen (ULONG)
        Length of CSA_Source or less to convert just a part
        Default: string length of CSA_Source

      CSA_SourceCodeset (struct codeset *)
        The codeset in which the source string is encoded.
        Default: the system's default codeset

      CSA_Dest (STRPTR)
        Destination buffer. If you supply a valid buffer here, you
        must also set CSA_DestLen to the length of your buffer. If
        CSA_AllocIfNeeded is TRUE, CSA_DestLen is checked to see if
        CSA_Dest may contain the whole utf8. If CSA_Dest can't
        contain the utf8, a brand new buffer is allocated. If
        CSA_AllocIfNeeded is FALSE, up to CSA_DestLen (ending '\0'
        included) are written to CSA_Dest. If CSA_DestHook is supplied,
        CSA_Dest is ignored.
        Default: NULL.

      CSA_DestHook (struct Hook *)
        Destination hook. If this is supplied, it is called with a
        partial converted string.

        The hook function should be declared as:

        ULONG ASM SAVEDS fun(REG(a0, struct Hook *hook),
                             REG(a2, struct convertMsg *msg),
                             REG(a1, STRPTR buf))

        struct Hook *hook
          Your hook

        STRPTR buf
          The partial '\0' terminated buffer

        msg->state - one of

          o CSV_Translating
            More calls to came

          o CSV_End
            Last call

        msg->Len
          length of string 'buf'

        You may define the min length of the buffer via CSA_DestLen.
        If so, accepted values are 16<=v<=sizeof_codeset_buffer.

        Don't count on this size to be fixed, even if you used
        CSA_DestLen !

      CSA_DestLen (ULONG)
        If CSA_DestHook is used, it represents the min length of the
        buffer that causes hook calls. Otherwise it is the size of
        the buffer supplied in CSA_Dest. So if CSA_DestHook is
        supplied, CSA_DestLen is optional, otherwise it is required.

      CSA_DestLenPtr (ULONG *)
        If supplied, will contain the length of the utf8 string

      CSA_AllocIfNeeded (BOOL)
        If the destination buffer length is too small to contain
        the UTF8 a new buffer is allocated
        Default: TRUE

      CSA_Pool (APTR)
        If a new destination buffer needs to be allocated (it happens
        if and only if CSA_DestHook is not used, CSA_AllocIfNeeded
        is TRUE, or if CSA_Dest buffer is too small for the utf8) this
        pool is used. The result must be freed via
        CodesetsFreeVecPooledA(pool, utf8, NULL).
        If CSA_Pool is not supplied, the destination buffer is allocated
        from the internal memory pool and must be freed via
        CodesetsFreeA(utf8, NULL).

      CSA_PoolSem (struct SignalSemaphore *)
        A semaphore to lock when using CSA_Pool

   RESULT
    utf8 - the utf8 string or NULL
           If CSA_DestHook is used always NULL.
           If CSA_DestHook is not used NULL means failure
           to allocate mem.

   EXAMPLE
    The shortest invocation is:
    -- cut here --
    UTF8   *utf8;
    STRPTR str;

    if((utf8 = CodesetsUTF8Create(CSA_Source, str,
                                  TAG_DONE)))
    {
        ...

        CodesetsFreeA(utf8,NULL);
    }
    -- cut here --


    In case you want to use your pool to allocate mem:
    -- cut here --
    UTF8   *utf8;
    STRPTR str;
    APTR   pool;

    if((utf8 = CodesetsUTF8Create(CSA_Source,     str,
                                  CSA_Pool,       pool,
                                  TAG_DONE)))
    {
        ...

        CodesetsFreeVecPooledA(pool,utf8,NULL);
    }
    -- cut here --


    If your pool is to be arbitrated via a semaphore:
    -- cut here --
    UTF8   *utf8;
    STRPTR str;
    APTR   pool;
    struct SignalSemaphore *sem;

    if((utf8 = CodesetsUTF8Create(CSA_Source,     str,
                                  CSA_Pool,       pool,
                                  CSA_PoolSem,    sem,
                                  TAG_DONE)))
    {
        ...

        CodesetsFreeVecPooledA(pool,utf8,NULL);
    }
    -- cut here --


    If you want to use your own buffer to reduce mem
    allocation:
    -- cut here --
    UTF8   *utf8;
    STRPTR buf[256];

    if((utf8 = CodesetsUTF8Create(CSA_Source,  str,
                                  CSA_Dest,    buf,
                                  CSA_DestLen, sizeof(buf),
                                  TAG_DONE)))
    {
        ...

        if(utf8 != buf)
          CodesetsFreeA(utf8,NULL);
    }
    -- cut here --


    If your string are max MAXLEN chars long (e.g. image to be
    in a MUI application and you know the max size of your
    string gadgets), you should better supply your own buffer:
    -- cut here --
    UTF8   *utf8;
    STRPTR buf[MAXSIZE*6+1];

    if((utf8 = CodesetsUTF8Create(CSA_Source, str,
                                  CSA_Dest,   buf,
                                  CSA_Dest,   sizeof(buf),
                                  TAG_DONE)))
    {
        ...
    }
    -- cut here --


    If you strings are very large and so you are sure there is
    no mem for them and or you have your own reasons to do
    that:
    -- cut here --
    static ULONG ASM SAVEDS
    destFun(REG(a0, struct Hook *hook),
            REG(a2, struct convertMsg *msg),
            REG(a1, STRPTR buf))
    {
      printf("[%3ld] [%s]\n",msg->len,buf);
      if(msg->state == CSV_End)
        printf("\n");

      return 0;
    }

    struct Hook dest;
    dest.h_Entry = (HOOKFUNC)destFun;

    CodesetsUTF8Create(CSA_Source,    str,
                       CSA_DestHook,  &dest,
                       TAG_DONE);
    -- cut here --

   SEE ALSO
    codesets.library/CodesetsUTF8ToStrA
    codesets.library/CodesetsUTF8Len



codesets.library/CodesetsUTF8ToStrA

   NAME
    CodesetsUTF8ToStrA - converts an UTF8 encoded string into
                         a specified destination codeset.


   SYNOPSIS
    str = CodesetsUTF8ToStrA(attrs);
    D0                       A0

    STRPTR CodesetsUTF8ToStrA(attrs);

    str = CodesetsUTF8ToStr(tag1, ...);
    D0                      A0

    STRPTR CodesetsUTF8ToStr(Tag,...);


   FUNCTION
    Convert an utf8 string to a specified codeset.

   INPUTS
    attrs - a list of mandatory tag items. Valid items are:

      CSA_Source (STRPTR)
        The string which you want to convert. Must be supplied,
        otherwise the functions returns NULL.

      CSA_SourceLen (ULONG)
        Length of CSA_Source. Must be > 0 or the function returns
        NULL.
        Default: string length of CSA_Source - strlen()

      CSA_Dest (STRPTR)
        Destination buffer. If you supply a valid buffer here, you
        must also set CSA_DestLen to the length of your buffer. If
        CSA_AllocIfNeeded is TRUE, CSA_DestLen is checked to see if
        CSA_Dest may contain the whole converted string. If CSA_Dest
        can't contain the output string, a brand new buffer is allocated.
        If CSA_AllocIfNeeded is FALSE, up to CSA_DestLen (ending '\0'
        included) are written to CSA_Dest. If CSA_DestHook is supplied,
        CSA_Dest is ignored.
        Default: NULL.

      CSA_DestCodeset (struct codeset *)
        The codeset to which the UTF8 string should be encoded to.
        Default: the system's default codeset

      CSA_DestHook (struct Hook *)
        Destination hook. If this is supplied, it is called with a
        partial converted string.

        The hook function should be declared as:

        ULONG ASM SAVEDS fun(REG(a0, struct Hook *hook),
                             REG(a2, struct convertMsg *msg),
                             REG(a1, STRPTR buf))

        struct Hook *hook
          Your hook

        STRPTR buf
          The partial '\0' terminated buffer

        msg->state - one of

          o CSV_Translating
            More calls to came

          o CSV_End
            Last call

        msg->Len
          length of string 'buf'

        You may define the min length of the buffer via CSA_DestLen.
        If so, accepted values are 16<=v<=sizeof_codeset_buffer.

        Don't count on this size to be fixed, even if you used
        CSA_DestLen !

      CSA_DestLen (ULONG)
        If CSA_DestHook is used, it represents the min length of the
        buffer that causes hook calls. Otherwise it is the size of
        the buffer supplied in CSA_Dest. So if CSA_DestHook is
        supplied, CSA_DestLen is optional, otherwise it is required.

      CSA_DestLenPtr (ULONG *)
        If supplied, will contain the length of the converted string.

      CSA_AllocIfNeeded (BOOL)
        If the destination buffer length is too small to contain
        the output string, a new buffer is allocated.
        Default: TRUE

      CSA_Pool (APTR)
        If a new destination buffer needs to be allocated (it happens
        if and only if CSA_DestHook is not used, CSA_AllocIfNeeded
        is TRUE, or if CSA_Dest buffer is too small for the utf8) this
        pool is used. The result must be freed via
        CodesetsFreeVecPooledA(pool, string, NULL).
        If CSA_Pool is not supplied, the destination buffer is allocated
        from the internal memory pool and must be freed via
        CodesetsFreeA(string, NULL).

      CSA_PoolSem (struct SignalSemaphore *)
        A semaphore to lock when using CSA_Pool

      CSA_ErrPtr (int *)
        Pointer to an integer variable which will be filled with the
        number of found issues (number of not convertable chars)
        Default: NULL

      CSA_MapForeignChars (BOOL)
        If a character of the source string cannot be directly mapped
        to the destination codeset a "?" character will normally be used
        to signal this case. If this attribute is set, an internal
        replacement table will be used which tries to replace these
        "foreign" characters by "looklike" ASCII character sequences.
        Please note, that this functionality is mostly just usable by
        Latin users due to the straight mapping to ASCII (7bit).
        Default: FALSE

      CSA_MapForeignCharsHook (struct Hook *)
        If a character of the source string cannot be directly mapped
        to the destination codeset a "?" character will normally be used
        to signal this case. By using this attribute, a hook can be
        supplied which is called for every such foreign character.
        Within this hook the UTF8 sequence is supplied which cannot be
        directly mapped to the destination codeset. During the execution
        of the hook a replacement string might be specified, which in turn
        will be used by the internals of codesets.library to map this
        "foreign" char to a difference character or UTF8 sequence.

        If both, CSA_MapForeignChars and CSA_MapForeignCharsHook, are
        specified the hook will only be executed in case the internal
        routines don't supply an own mapping for the foreign UTF8 sequence.

        The hook function should be declared as:

        ULONG ASM SAVEDS fun(REG(a0, struct Hook *hook),
                             REG(a2, struct replaceMsg *msg),
                             REG(a1, void *dummy))

        struct Hook *hook
          Your hook

        msg->dst
          place your desired replacement string here

        msg->src
          the UTF8 sequence to be replaced, this string is READ-ONLY!

        msg->srclen
          the length of the UTF8 sequence to be replaced, do NOT peek
          beyond this limit.

        The return value of this hook function is the length of the replacement
        string. Return zero if no replacement did happen. Positive values will
        be treated as lengths of ASCII strings. Negative values signals a
        replacement by another UTF8 sequence. Please note, that in case you
        supply a UTF8 sequence as a replacement for the "foreign" UTF8, your
        hook might be called again if this sequence can still not be mapped to
        the destination codesets, thus is again a "foreign" sequence.


   RESULT
    str - the string or NULL
          If CSA_DestHook is used always NULL.
          If CSA_DestHook is not used NULL means failure
          to allocate mem.

   SEE ALSO
    codesets.library/CodesetsUTF8CreateA
    codesets.library/CodesetsUTF8Len



codesets.library/CodesetsUTF8Len

   NAME
    CodesetsUTF8Len - returns the length of a supplied utf8 string.

   SYNOPSIS
    len = CodesetsUTF8Len(utf8);
    D0                    A0

    ULONG CodesetsUTF8Len(UTF8 *);

   FUNCTION
    Returns the amount of real characters stored in a supplied UTF8
    string. This is _NOT_ the space required to store the UTF8 string,
    it is the actual number of _real_ character the UTF8 represents.

   INPUTS
    utf8 - pointer to the UTF8 string generated by the internal
           functions of codesets.library

   RESULT
    len - length of utf8

   SEE ALSO
    codesets.library/CodesetsUTF8CreateA
    codesets.library/CodesetsUTF8ToStrA



codesets.library/CodesetsIsValidUTF8

   NAME
    CodesetsIsValidUTF8 - tells if a supplied standard string is meant to
                          carry a perfectly valid UTF8 sequence

   SYNOPSIS
    result = CodesetsIsValidUTF8(str);
    D0                           A0

    BOOL CodesetsIsValidUTF8(STRPTR);

   FUNCTION
    Returns TRUE in case the supplied string only contains char sequences
    which are compatible to the UTF8 standard.

   INPUTS
    str - a standard STRPTR string.

   RESULT
    result - TRUE in case the string conatins valid UTF8 data.

   NOTE
    This function uses the common 'GOOD_UCS' macro together with parsing
    the whole string. This means that it will only return TRUE in case
    the supplied string only contains UTF8 sequences. A mixture of UTF8
    and non-UTF8 sequences will result in the function returning FALSE.

   SEE ALSO
    codesets.library/CodesetsUTF8CreateA
    codesets.library/CodesetsUTF8ToStrA



codesets.library/CodesetsIsLegalUTF8

   NAME
    CodesetsIsLegalUTF8 - check a UTF8 sequence

   SYNOPSIS
    res = CodesetsIsLegalUTF8(source, length);
                              A0      D0

    ULONG CodesetsIsLegalUTF8(UTF8 *, ULONG);


   FUNCTION
    Checks if source is a valid UTF8 sequence generated
    by the internal functions of codesets.library

   INPUTS
    source - the char sequence to check
    length - size of source

   RESULT
    res - TRUE or FALSE

   SEE ALSO
    codesets.library/CodesetsUTF8CreateA
    codesets.library/CodesetsUTF8ToStrA



codesets.library/CodesetsIsLegalUTF8Sequence

   NAME
    CodesetsIsLegalUTF8Sequence - check a char sequence

   SYNOPSIS
    res = CodesetsIsLegalUTF8Sequence(source, end);
                                      A0      A1

    ULONG CodesetsIsLegalUTF8(UTF8 *, UTF8 *);

   FUNCTION
    Check if source is a valid UTF8 sequence within the
    source and end boundaries.

   INPUTS
    source - the char sequence to check
    end - pointer to the end of the sequence to check

   RESULT
    res - TRUE or FALSE

   SEE ALSO
    codesets.library/CodesetsUTF8CreateA
    codesets.library/CodesetsUTF8ToStrA



codesets.library/CodesetsStrLenA

   NAME
    CodesetsStrLenA - returns the length of the source string
                      in case it will be converted to an UTF8
                      string.

   SYNOPSIS
    len = CodesetsStrLenA(str, attrs)
                          A0   A1

    ULONG CodesetsStrLenA(STRPTR, struct TagItem *);

    len = CodesetsStrLen(str, tag1, ...);
                         A0   A1

    ULONG CodesetsStrLen(STRPTR, Tag, ...);

   FUNCTION
    Return the length (size) of str in case it will be converted to
    an UTF8 compliant string.

   INPUTS
    str - the string to obtain length of
    attrs - a list of additional tag items. Valid items are:

      CSA_SourceCodeset (struct codeset *)
        The codeset the source string is encoded in.
        Default: the system's default codeset

      CSA_SourceLen (ULONG)
        The length of str
        Default: string length of CSA_Source

   RESULT
    len - the length of the string if it will be converted to
          an UTF8 string.

   SEE ALSO
    codesets.library/CodesetsUTF8CreateA



codesets.library/CodesetsConvertUTF16toUTF32

   NAME
    CodesetsConvertUTF16toUTF32 - converts from UTF16 to UTF32

   SYNOPSIS
    res = CodesetsConvertUTF16toUTF32(sourceStart,sourceEnd,targetStart,targetEnd,flags );
    D0                                A0          A1        A2          A3        D0

    ULONG CodesetsConvertUTF16toUTF32(const UTF16 **,const UTF16 *,UTF32 **,UTF32 *,ULONG);

   FUNCTION
    Converts UTF16 to UTF32.

   INPUTS

   RESULT

   SEE ALSO



codesets.library/CodesetsConvertUTF16toUTF8

   NAME
    CodesetsConvertUTF16toUTF8 - converts from UTF16 to UTF8

   SYNOPSIS
    res = CodesetsConvertUTF16toUTF8(sourceStart,sourceEnd,targetStart,targetEnd,flags );
    D0                                A0          A1        A2          A3        D0

    ULONG CodesetsConvertUTF16toUTF8(const UTF16 **,const UTF16 *,UTF8 **,UTF8 *,ULONG);

   FUNCTION
    Converts UTF16 to UTF8.

   INPUTS

   RESULT

   SEE ALSO



codesets.library/CodesetsConvertUTF32toUTF16

   NAME
    CodesetsConvertUTF32toUTF16 - converts from UTF32 to UTF16

   SYNOPSIS
    res = CodesetsConvertUTF32toUTF16(sourceStart,sourceEnd,targetStart,targetEnd,flags );
    D0                                A0          A1        A2          A3        D0

    ULONG CodesetsConvertUTF32toUTF16(const UTF32 **,const UTF32 *,UTF16 **,UTF16 *,ULONG);

   FUNCTION
    Converts UTF32 to UTF16.

   INPUTS

   RESULT

   SEE ALSO



codesets.library/CodesetsConvertUTF32toUTF8

   NAME
    CodesetsConvertUTF32toUTF8 - converts from UTF32 to UTF8

   SYNOPSIS
    res = CodesetsConvertUTF32toUTF8(sourceStart,sourceEnd,targetStart,targetEnd,flags );
    D0                                A0          A1        A2          A3        D0

    ULONG CodesetsConvertUTF32toUTF8(const UTF32 **,const UTF32 *,UTF8 **,UTF8 *,ULONG);

   FUNCTION
    Converts UTF32 to UTF16.

   INPUTS

   RESULT

   SEE ALSO



codesets.library/CodesetsConvertUTF8toUTF16

   NAME
    CodesetsConvertUTF8toUTF16 - converts from UTF8 to UTF16

   SYNOPSIS
    res = CodesetsConvertUTF8toUTF16(sourceStart,sourceEnd,targetStart,targetEnd,flags );
    D0                                A0          A1        A2          A3        D0

    ULONG CodesetsConvertUTF8toUTF16(const UTF8 **,const UTF8 *,UTF16 **,UTF16 *,ULONG);

   FUNCTION
    Converts UTF8 to UTF16.

   INPUTS

   RESULT

   SEE ALSO



codesets.library/CodesetsConvertUTF8toUTF32

   NAME
    CodesetsConvertUTF8toUTF32 - converts from UTF8 to UTF32

   SYNOPSIS
    res = CodesetsConvertUTF8toUTF32(sourceStart,sourceEnd,targetStart,targetEnd,flags );
    D0                                A0          A1        A2          A3        D0

    ULONG CodesetsConvertUTF8toUTF32(const UTF8 **,const UTF8 *,UTF32 **,UTF32 *,ULONG);

   FUNCTION
    Converts UTF8 to UTF32.

   INPUTS

   RESULT

   SEE ALSO



codesets.library/CodesetsDecodeB64A

   NAME
    CodesetsDecodeB64A - decodes a supplied base64 encoded string
                         or file into plain text charwise.

   SYNOPSIS
    res = CodesetsDecodeB64A(attrs);
    D0                       A0

    ULONG CodesetsDecodeB64A(struct TagItem *);

    res = CodesetsDecodeB64(tag1, ...);
    D0                      A0

    ULONG CodesetsDecodeB64A(Tag, ....);

   FUNCTION
    Decodes a string or a complete base64 encoded file to a
    plain text buffer or also a destination file

   INPUTS
    attrs - a list of mandatory tag items. Valid items are:

      CSA_B64SourceString (STRPTR)
        The source string to decode

      CSA_B64SourceLen (ULONG)
        The length of CSA_B64SourceString Must be supplied if
        CSA_B64SourceString is used.

      CSA_B64SourceFile (STRPTR)
        Source file name.

      CSA_B64DestPtr (STRPTR *)
        Destination buffer pointer. Set to the allocated buffer.
        Must be supplied if CSA_B64DestFile is not used. To
        free the buffer use CodesetsFreeA().

      CSA_B64DestFile (STRPTR)
        Destination file name. Must be supplied if
        CSA_B64DestPtr is used.

      CSA_B64FLG_NtCheckErr (BOOL)
        Don't stop on error.

   RESULT
    res - result, one of (if 0 OK, if >0 error)
        CSR_B64_ERROR_OK
        CSR_B64_ERROR_MEM
        CSR_B64_ERROR_DOS
        CSR_B64_ERROR_INCOMPLETE
        CSR_B64_ERROR_ILLEGAL

   NOTE
    It fully operates charwise and doesn't take respect of the
    individual codeset the decoded data may be still be encoded to.

   SEE ALSO
    codesets.library/CodesetsEncodeB64A



codesets.library/CodesetsEncodeB64A

   NAME
    CodesetsEncodeB64A - encodes a string or whole file
                         to base64

   SYNOPSIS
    res = CodesetsEncodeB64A(attrs);
    D0                       A0

    ULONG CodesetsEncodeB64A(struct TagItem *);

    res = CodesetsEncodeB64(tag1, ...);
    D0                      A0

    ULONG CodesetsEncodeB64(Tag, ....);

   FUNCTION
    Encodes the supplied string or file to either a whole
    buffer or also to a file.

   INPUTS
    attrs - a list of mandatory tag items. Valid items are:

      CSA_B64SourceString (STRPTR)
        The source string to encode

      CSA_B64SourceLen (ULONG)
        The length of CSA_B64SourceString. Must be supplied if
        CSA_B64SourceString is used.

      CSA_B64SourceFile (STRPTR)
        Source file name.

      CSA_B64DestPtr (STRPTR *)
        Destination buffer pointer. Set to the allocated buffer.
        Must be supplied if CSA_B64DestFile is not used. To
        free the buffer use CodesetsFreeA().

      CSA_B64DestFile (STRPTR)
        Destination file name. Must be supplied if
        CSA_B64DestPtr is used.

      CSA_B64MaxLineLen (ULONG)
        Maximum length of encoded lines. 0<v<256
        Default: 72

      CSA_B64Unix (ULONG)
        If TRUE eol is \n (LF), otherwise \r\n (CRLF).
        Default: TRUE

   RESULT
    res - result, one of (if 0 OK, if >0 error)
        CSR_B64_ERROR_OK
        CSR_B64_ERROR_MEM
        CSR_B64_ERROR_DOS
        CSR_B64_ERROR_INCOMPLETE
        CSR_B64_ERROR_ILLEGAL

   NOTE
    It fully operates charwise and doesn't take respect of the
    individual codeset the decoded data may be encoded to.

   SEE ALSO
    codesets.library/CodesetsDecodeB64A