I am trying to create a program that has multiple sequences of tRNA stored as a dictionary. I have set up my code to extract and store the sequences and the specific names associated with the sequences as:
class Unique():
def __init__(self, seq = ''):
for s in range(len(seq)):
for e in range(s + 1, len(seq) + 1):
self.add(seq[s:e])
self.head = head
self.sequence = seq
self.original = {}
def cleaner(self):
for (header, sequence) in myReader.readFasta():
clean = sequence.replace('-','').replace('_','')
self.original[self.head] = clean
return self.original
def sites(self):
Unique.cleaner(self)
I am calling on the sites function (which is why it runs cleaner as the first step), but I am lost on how I can go about writing code to find unique strings in each stored sequence.
As an example if I have 2 sets of Sequences:
UCGUUAGCAGCGCAUU
The program would be able to tell me that the first sequence's unique string is UCG and the second's is AGC, since UCG is ONLY present in the first sequence and AGC is only present in the second.
EDIT: What I mean by unique sequence: Any strand of the sequence I can see and automatically know which sequence it came from. So if the strand UCGA only exists in one sequence, it is counted and saved as a unique strand associated with that sequence.
The sequences extracted look like this:
GAGAGAGACAUAGAGGDUAUGAPGPPGG'UUGAACCAAUAGUAGGGGGUPCG"UUCCUUCCUUUCUUACCA