I am trying to find a way to filter emojis from utf8 text files. Apparently there is a javascript regex available (https://raw.githubusercontent.com/mathiasbynens/emoji-regex/master/index.js) which can be used to match emojis. I could not translate this regex to c# dialect (looks like there are some differences i don't understand). Then I tried following simple code to match all non-word and non-space characters in my texts (to go over them manually and select emojis, then put them in a regex and replace them with empty string).
string input = @"some path\";
            List<char> emojis = new List<char>();
            foreach(FileInfo file in new DirectoryInfo(input).GetFiles("*.txt", SearchOption.AllDirectories))
            {
                MatchCollection matches = Regex.Matches(File.ReadAllText(file.FullName), @"[^\w\s]{1}");
                foreach(Match match in matches)
                {
                    string value = match.Value;
                    foreach(char c in value.ToCharArray())
                    {
                        if(!emojis.Contains(c))
                        {
                            emojis.Add(c);
                        }
                    }
                }
            }
            foreach(char c in emojis)
            {
                File.AppendAllText(@"\\Emojis.txt", c.ToString()+"|");
            }
But I get exception in #develop
System.Text.EncoderFallbackException: Unable to translate Unicode character \uD83D at index 0 to specified code page.
Apparently it is not a good idea to convert regex matched characters to characters. Any ideas how can I fix this? Regards