I have a text with many utf-8 characters such as š,œ and Ó.
There are many questions on this plattform regarding how to transform utf-8 to ascii focusing on the byte-encoding.
I would like to know wether there is a method that can be used in python to replace all of these characters to the
most similiar analog in the regex-pattern range
[a-zA-Z0-9.,:?!@$€] ( or [α-ωΑ-Ωa-zA-Z0-9.,:?!@$€]), i.e. to all latin (greek) letters, numbers and punctuation signs.
That would yield š -> s, œ -> oe, Ó -> O. but † -> <nothing>
In case, no close relation can be found, e.g. for symbols or smileys, they should be deleted.
I know, it is subjective which characters can be identified with each other, but maybe there is an approximate solution.
 
    