I want to use PySpark to efficiently remove Emoji (e.g., :-)) from 1 billion records. How could I achieve this using pyspark syntax?
            Asked
            
        
        
            Active
            
        
            Viewed 205 times
        
    0
            
            
         
    
    
        smci
        
- 32,567
- 20
- 113
- 146
 
    
    
        william007
        
- 17,375
- 25
- 118
- 194
- 
                    3Do you mean emoji or emoticons? Those are 2 different things – Ranoiaetep Jun 27 '20 at 06:45
- 
                    3Also you should probably create a [mcve](https://stackoverflow.com/help/minimal-reproducible-example) , references [here](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples) – anky Jun 27 '20 at 07:41
- 
                    This topic is super-interesting but your question way too broad, hence offtopic for SO. To make it on-topic for SO, can you fix it by adding example data and example code. Do you a) have a list of all the emojis you might encounter, or are you b) looking for a pretrained model that has a decent list, or c) do you want to learn them (hard, but doable)? (I've been working on this exact task recently, and I can tell you a) is manual, b) is seriously fallible, but c) is pretty hard) – smci Jun 28 '20 at 20:57
