All, I have some really dirty text being imported to SQL and I can't work out how to clean it up. After a lot of digging it seems the file contains some non ASCII characters, hex99 for example being the TM code.
My required output from this string
ENM / éææ¨FEE/\~`+=-
Would be
'ENM / FEE/\~`+=-'
I am sure there is a quick fix but I have not yet found it.
Database collation is SQL_Latin1_General_CP1_CI_AS
Here is the dirty string I have the issue with ENM/éææ¨FEE I don't know how it will render here but its basically something like ENM/金晓晨FEE
This is not rendering with the correct dodgy characters
A file with the text in is here: https://textsaver.flap.tv/lists/3fjv
I've tried various ways to try and get rid of these characters
This is my existing replace function, pretty standard stuff
 CREATE FUNCTION [dbo].[ufn_CleanText]
 (
     @String NVARCHAR(MAX) 
 )
 RETURNS NVARCHAR(MAX)
 AS
 BEGIN
  DECLARE @Result nvarchar(MAX)
     SET @Result =''
     DECLARE @character nvarchar(1)
     DECLARE @index int
     SET @index = 1
     WHILE @index <= LEN(@String)
     BEGIN
         SET @character = SUBSTRING(@String, @index, 1)
         IF (UNICODE(@character) between 32 and 127) --or UNICODE(@character) in (10,11)
             SET @Result = @Result + @character
         SET @index = @index + 1
     END
     RETURN @Result
 END;
GO
Selects
select dbo.ufn_CleanText( cast('ENM/éææ¨FEE' COLLATE Latin1_General_100_BIN2 as nvarchar ))
union all
select cast('ENM/éææ¨FEE' as nvarchar)
union all
SELECT REPLACE('ENM/éææ¨FEE',char(0x99),'')
returns
ENM/?????FEE
ENM/é??æ??æ?¨FEE
ENM/é??æ??æ?¨FEE
What is frustrating is I can do this pretty easily with c# so I could do a CLR but I would rather do this all in SQL if possible? I am not sure if there is some way to use pat index like this in SQL?
e.g
string stringvar = "ENM / 金晓晨FEE";
Console.WriteLine($"This is dirty: {stringvar}");
string clean = Regex.Replace(stringvar, @"[^\u0000-\u007F]+", string.Empty);
Console.WriteLine($"this is clean {clean}");
Console.ReadLine();
returns
This is dirty: ENM / é??æ??æ?"FEE
this is clean ENM / FEE
Edit1:
Ok so I found a patindex function: How to strip all non-alphabetic characters from string in SQL Server?
CREATE FUNCTION [dbo].[fn_StripCharacters]
(
    @String NVARCHAR(MAX), 
    @MatchExpression VARCHAR(255)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    SET @MatchExpression =  '%['+@MatchExpression+']%'
    WHILE PatIndex(@MatchExpression, @String) > 0
        SET @String = Stuff(@String, PatIndex(@MatchExpression, @String), 1, '')
    RETURN @String
END
and if I call it like this:
select dbo.fn_StripCharacters('ENM/éææ¨FEE/\~+=-' COLLATE Latin1_General_BIN, '^0-9a-z/+-*/=')`
Then I get back something which is closer 
ENM/éææFEE/=
But it still contains characters that I do not want e.g é and excludes some that I do but its a step in the right direction
Edit2:
here is the hexdump:
Offset: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000: 45 4E 4D 2F C3 A9 C2 87 C2 91 C3 A6 C2 99 C2 93
00000010: C3 A6 C2 99 C2 A8 46 45 45                        
 
    