I just want to improve the speed of splitting a list.Now I have a way to split the list, but the speed is not as fast as I expected.
def split_list(lines):
        return [x for xs in lines for x in xs.split('-')]
import time
lst= []
for i in range(1000000):
    lst.append('320000-320000')
start=time.clock()
lst_new=split_list(lst)
end=time.clock()
print('time\n',str(end-start))
For example,Input:
lst
 ['320000-320000', '320000-320000']
Output:
lst_new
 ['320000', '320000', '320000', '320000']
I'm not satisfied with the speed of spliting,as my data contains many lists.
But now I don't know whether there's a more effective way to do it.
According to advice,I try to describe my whole question more specifically.
import pandas as pd
df = pd.DataFrame({ 'line':["320000-320000, 340000-320000, 320000-340000",
                            "380000-320000",
                            "380000-320000,380000-310000",
                            "370000-320000,370000-320000,320000-320000",
                            "320000-320000, 340000-320000, 320000-340000",
                            "380000-320000",
                            "380000-320000,380000-310000",
                            "370000-320000,370000-320000,320000-320000",
                            "320000-320000, 340000-320000, 320000-340000",
                            "380000-320000",
                            "380000-320000,380000-310000",
                            "370000-320000,370000-320000,320000-320000"], 'id':[1,2,3,4,5,6,7,8,9,10,11,12],})
def most_common(lst):
    return max(set(lst), key=lst.count)
def split_list(lines):
    return [x for xs in lines for x in xs.split('-')]
df['line']=df['line'].str.split(',')
col_ix=df['line'].index.values
df['line_start'] = pd.Series(0, index=df.index)
df['line_destination'] = pd.Series(0, index=df.index)
import time 
start=time.clock()
for ix in col_ix:
    col=df['line'][ix]
    col_split=split_list(col)
    even_col_split=col_split[0:][::2]
    even_col_split_most=most_common(even_col_split)
    df['line_start'][ix]=even_col_split_most
    odd_col_split=col_split[1:][::2]
    odd_col_split_most=most_common(odd_col_split)
    df['line_destination'][ix]=odd_col_split_most
end=time.clock()
print('time\n',str(end-start))
del df['line']
print('df\n',df)
Input:
df
 id                                         line
0    1  320000-320000, 340000-320000, 320000-340000
1    2                                380000-320000
2    3                  380000-320000,380000-310000
3    4    370000-320000,370000-320000,320000-320000
4    5  320000-320000, 340000-320000, 320000-340000
5    6                                380000-320000
6    7                  380000-320000,380000-310000
7    8    370000-320000,370000-320000,320000-320000
8    9  320000-320000, 340000-320000, 320000-340000
9   10                                380000-320000
10  11                  380000-320000,380000-310000
11  12    370000-320000,370000-320000,320000-320000
Output:
df
 id  line_start  line_destination
0    1     320000    320000
1    2     380000    320000
2    3     380000    320000
3    4     370000    320000
4    5     320000    320000
5    6     380000    320000
6    7     380000    320000
7    8     370000    320000
8    9     320000    320000
9   10     380000    320000
10  11     380000    320000
11  12     370000    320000
You can regard the number of line(eg.320000-32000 represent the starting point and destination of the route).
Expected:
Make the code run faster.(I can't bear the speed of the code)
 
     
     
    