I have a tab separated text file like this small example:
small example:
chr15   60644581    60644675    ANXA2   ENST00000332680.4   annotated
chr15   60643922    60644018    ANXA2   ENST00000332680.4   annotated
chr15   60643391    60643450    ANXA2   ENST00000332680.4   annotated
chr15   60641273    60641396    ANXA2   ENST00000332680.4   annotated
chr15   60639831    60639888    ANXA2   ENST00000332680.4   annotated
chr14   25443870    25444024    STXBP6  ENST00000550887.1   annotated
chr14   25326232    25326363    STXBP6  ENST00000550887.1   annotated
chr14   25325141    25325307    STXBP6  ENST00000550887.1   annotated
chr14   25288242    25288400    STXBP6  ENST00000550887.1   annotated
I want to make another tab separated file with 6 columns but less rows. in fact I want to summarize the rows. rows with the same 4th column must be summarized in one row in fact at first I need to group the rows based on 4th column. the new row has the same 1st, 4th, 5th and 6th column but its 2nd column comes from the first row of each group and the 3rd column comes from the last row of each group. in the small example we have 2 groups based on 4th column so for the first group we would have this row:
chr15   60644581    60639888    ANXA2   ENST00000332680.4   annotated
and the expected output for the small example would look like this:
expected output:
chr15   60644581    60639888    ANXA2   ENST00000332680.4   annotated
chr14   25443870    25288400    STXBP6  ENST00000550887.1   annotated
I am trying to do that in python using the following command but it does not return the expected output. do you know how to fix it?
import pandas as pd
p = pd.read_csv("myfile.txt", sep='\t')
data = p.groupby('4th column')
mylist = []
for i in range(len(data)):
    new_line = 'data[0], data[1], data[2], data[3], data[4], data[5]'
    mylist.append(new_line)
    df = pd.DataFrame(mylist)
    df.to_csv('outfile.txt', sep='\t', index=None, header=False)