A profiling of my code shows that methods split and strip of str objects are amongst the the most called functions.
It happens that I use constructs such as:
with open(filename, "r") as my_file:
for line in my_file:
fields = line.strip("\n").split("\t")
And some of the files to which this is applied have a lot of lines.
So I tried using the "avoid dots" advice in https://wiki.python.org/moin/PythonSpeed/PerformanceTips as follows:
from functools import partial
split = str.split
tabsplit = partial(split, "\t")
strip = str.strip
endlinestrip = partial(strip, "\n")
def get_fields(tab_sep_line):
return tabsplit(endlinestrip(tab_sep_line))
with open(filename, "r") as my_file:
for line in my_file:
fields = getfields(line)
However, this gave me a ValueError: empty separator for the return line of my get_fields function.
After investigating, what I understand is that the separator for the split method is the second positional argument, the first being the string object itself, which made functools.partial understand "\t" as the string to be split, and I was using the result of "\n".strip(tab_sep_line) as separator. Hence the error.
What woud you suggest to do instead?
Edit:
I tried to compare three ways to implement the get_fields function.
Approach 1: Using plain .strip and .split
def get_fields(tab_sep_line):
return tab_sep_line.strip("\n").split("\t")
Approach 2: Using lambda
split = str.split
strip = str.strip
tabsplit = lambda s : split(s, "\t")
endlinestrip = lambda s : strip(s, "\n")
def get_fields(tab_sep_line):
return tabsplit(endlinestrip(tab_sep_line))
Approach 3: Using the answer provided by Jason S
split = str.split
strip = str.strip
def get_fields(tab_sep_line):
return split(strip(tab_sep_line, "\n"), "\t")
Profiling indicates cumulated time for get_fields as follows:
Approach 1: 13.027
Approach 2: 16.487
Approach 3: 9.714
So avoiding dots makes a difference but using lambda seems counter-productive.