Questions tagged [python-module-unicodedata]
15 questions
                    
                    22
                    
            votes
                
                1 answer
            
        Why doesn't unicodedata recognise certain characters?
In Python 2.7 at least, unicodedata.name() doesn't recognise certain characters.
>>> from unicodedata import name
>>> name(u'\n')
Traceback (most recent call last):
  File "", line 1, in 
ValueError: no such name
>>> name(u'a')
'LATIN…  
         
    
    
        Hammerite
        
- 21,755
- 6
- 70
- 91
                    3
                    
            votes
                
                1 answer
            
        Determine if a unicode character exists in a unicode subset
I'd like to find a way to determine if a Unicode character exists in a standardized subset of Unicode characters, specifically Latin basic and Latin-1.  I am using Python 2 and the unicodedata module but need a solution that works in 3 as well…
         
    
    
        rustinpeace91
        
- 89
- 2
- 8
                    3
                    
            votes
                
                1 answer
            
        Python convert this utf8 string to latin1
I have this UTF-8 string: 
s = "Naděždaüäö"
Which I'd like to convert to a UTF-8 string which can be encoded in "latin-1" without throwing an exception. I'd like to do so by replacing every character which cannot be found in latin-1 by its closest…
         
    
    
        Dominik Neise
        
- 1,179
- 1
- 10
- 23
                    3
                    
            votes
                
                1 answer
            
        What is the difference between unicodedata.digit and unicodedata.numeric?
From unicodedata doc:
unicodedata.digit(chr[, default]) Returns the digit value assigned to
  the character chr as integer. If no such value is defined, default is
  returned, or, if not given, ValueError is raised.
unicodedata.numeric(chr[,…
        user1785721
                    2
                    
            votes
                
                1 answer
            
        What are the differences between the modules unicode and unicodedata?
I have a large dataset with over 2 million rows of textual data. Now I want to remove the accents from the strings. 
In the link below, two different modules are described to remove the accents:
What is the best way to remove accents in a Python…
         
    
    
        Emil
        
- 1,531
- 3
- 22
- 47
                    2
                    
            votes
                
                1 answer
            
        Get a list of all Greek unicode characters
I would like to know how to obtain a list of all Greek characters (upper and lowercase letters). I know how to find specific characters (unicodedata.lookup(name)), but I want all upper and lowercase letters.
Is there any way to do this?
         
    
    
        Microlith57
        
- 45
- 8
                    1
                    
            vote
                
                0 answers
            
        UnicodeEncodeError printing Hangul characters in the terminal
This application runs on a mac only and I'm stuck with Python 2.
I have an input string '한글' which when decoded through an online unicode converter shows as \u1112\u1161\u11ab\u1100\u1173\u11af
For my application to work, I need to convert this…
         
    
    
        Lewis
        
- 41
- 6
                    1
                    
            vote
                
                2 answers
            
        Remove special characters from string such as smileys but keep german special charactes
I know how to remove unwanted charactes in a string, like smileys etc. However, some languages like german have special charactes, too. 
This is my current code:
import unicodedata
string = "süß "
uni_str = str(unicodedata.normalize('NFKD', \
      …
         
    
    
        Kev1n91
        
- 3,553
- 8
- 46
- 96
                    0
                    
            votes
                
                1 answer
            
        More efficient way to replace special chars with their unicode name in pandas df
I have a large pandas dataframe and would like to perform a thorough text cleaning on it. For this, I have crafted the below code that evaluates if a character is either an emoji, number, Roman number, or a currency symbol, and replaces these with…
         
    
    
        lazarea
        
- 1,129
- 14
- 43
                    0
                    
            votes
                
                2 answers
            
        Capture output including control characters of subprocess
I have the following simple program to run a subprocess and tee its output to both stdout and some buffer
import subprocess
import sys
import time
import unicodedata
p = subprocess.Popen(
    "top",
    shell=True,
    stdout=subprocess.PIPE,
   …
         
    
    
        Mugen
        
- 8,301
- 10
- 62
- 140
                    0
                    
            votes
                
                1 answer
            
        Convert check mark in Python
I have a dataframe which has, in a certain column, a check mark (unicode: '\u2714'). I have been trying to replace it with the following coomand:
import unicodedata
df['Column'].str.replace(unicodedata.lookup("\u2714"), '')
But, i keep on reading…
         
    
    
        bellotto
        
- 445
- 3
- 13
                    0
                    
            votes
                
                1 answer
            
        Understanding unistr of unicodedata.normalize()
Wikipedia basically says the following for the four values of unistr.
- NFC (Normalization Form Canonical Composition)
    - Characters are decomposed
    - then recomposed by canonical equivalence.
- NFKC (Normalization Form Compatibility…
         
    
    
        user1424739
        
- 11,937
- 17
- 63
- 152
                    0
                    
            votes
                
                3 answers
            
        How to remove every possible accents from a column in python
I am new in python. I have a data frame with a column, named 'Name'. The column contains different type of accents. I am trying to remove those accents. For example, rubén => ruben, zuñiga=zuniga, etc. I wrote following code:
import numpy as…
         
    
    
        user3642360
        
- 762
- 10
- 23
                    -1
                    
            votes
                
                1 answer
            
        C++ implementation of python unicodedata library
New user here, please be gentle.
we are looking to implement a piece of python code in c++, but it involves some intricate unicode library called unicodedata, in particular this function 
unicodedata.category('A')  # 'L'etter, 'u'ppercase
'Lu'
Any…
         
    
    
        John Jiang
        
- 827
- 1
- 9
- 19
                    -1
                    
            votes
                
                1 answer
            
        how to return values from map function on dataframe
I am trying to return values from map function but instead it gives me the memory address. I tried using list, but then it gives me an error stating str object doesn't have an attribute decode. Is there a way out?
         
    
    
        via2
        
- 9
- 3