To me, a return statement would do the exact same thing.
Using return instead wouldn't be the same as yield, as explained in ShadowRanger's comment.
With yield, calling the function gives you a generator object:
>>> standardize_text("ABCD")
<generator object standardize_text at 0x10561f740>
Generators can produce more than one result (unlike functions that use return). This generator happens to produce exactly one item, which is a string (the result of re.sub). You can collect the generator's results into a list(), for example, or just grab the first result with next():
>>> list(standardize_text("ABCD"))
['XD']
>>> g = standardize_text("ABCD")
>>> next(g)
'XD'
>>> next(g) # raises StopIteration, indicating the generator has finished
If we change the function to use return:
def standardize_text(text: str):
pattern = r"ABC" # some regex
return re.sub(pattern, "X", text)
Then calling the function just gives us the single result only — no list() or next() needed.
>>> standardize_text("ABCD")
'XD'
Is there a reason why that yield would be useful?
In the standardize_text function, no, not really. But your preprocess_docs function actually does make use of returning more than one value with yield: it returns a generator with one result for each of the values in docs. Those results are either generators themselves (in your original code with yield) or strings (if we change standardize_text to use return).
def preprocess_docs(docs: List[str]):
for doc in docs:
yield standardize_text(doc)
# returns a generator because the implementation uses "yield"
>>> preprocess_docs(["ABCD", "AAABC"])
<generator object preprocess_docs at 0x10561f820>
# with standardize_text using "yield re.sub..."
>>> for x in preprocess_docs(["ABCD", "AAABC"]): print(x)
...
<generator object standardize_text at 0x1056cce40>
<generator object standardize_text at 0x1056cceb0>
# with standardize_text using "return re.sub..."
>>> for x in preprocess_docs(["ABCD", "AAABC"]): print(x)
...
XD
AAX
Note: Prior to Python 3's async/await, some concurrency libraries used yield in the same way that await is now used. For example, Twisted's @inlineCallbacks. I don't think this is directly relevant to your question, but I included it for completeness.