The problem was that I was not using QThread properly.
The result of printing
print("(Current Thread)", QThread.currentThread(),"\n")
print("(Current Thread)", int(QThread.currentThreadId()),"\n")
noticed me that the PickleDumpingThread I created was running in the main thread, not in some seperated thread.
The reason of this is run() is the only function in QThread that runs in seperate thread, so method like savePickle in QThread run in main thread.
First Solution
The proper usage of using signal was using Worker as following.
from PyQt5.QtCore import QThread
class GenericThread(QThread):
def run(self, *args):
# print("Current Thread: (GenericThread)", QThread.currentThread(),"\n")
self.exec_()
class PickleDumpingWorker(QObject):
pickleDumpingSignal = pyqtSignal(dict)
def __init__(self):
super().__init__()
self.pickleDumpingSignal[dict].connect(self.savePickle)
def savePickle(self, signal_dict)
pickle.dump(signal_dict["deque"], open(file, "wb"))
pickleDumpingThread = GenericThread()
pickleDumpingThread.start()
pickleDumpingWorker = PickleDumpingWorker()
pickleDumpingWorker.moveToThread(pickleDumpingThread)
class Analyzer():
def __init__(self):
self.cnt = 0
self.dataDeque = deque(MAXLENGTH=10000)
def onData(self, data):
self.dataDeque.append({
"data": data,
"createdTime": time.time()
})
self.cnt += 1
if self.cnt % 10000 == 0:
pickleDumpingWorker.pickleDumpingSignal.emit({
"action": savePickle,
"deque": self.dataDeque
})
# pickle.dump(dataDeque, open(file, 'wb'))
This solution worked (pickle was dumped in seperate thread), but drawback of it is the data stream still delays about 0.5~1 seconds because of signal emit() function.
I found the best solution for my case is @PYPL 's code, but the code needs a few modifications to work.
Final Solution
Final solution is modifying @PYPL 's following code
thread = PickleDumpingThread(self.dataDeque)
thread.start()
to
self.thread = PickleDumpingThread(self.dataDeque)
self.thread.start()
The original code have some runtime error. It seems like thread is being garbage collected before it dumps the pickle because there's no reference to that thread after onData() function is finished.
Referencing the thread by adding self.thread solved this issue.
Also, it seems that the old PickleDumpingThread is being garbage collected after new PickleDumpingThread is being referenced by self.thread (because the old PickleDumpingThread loses its reference).
However, this claim is not verified (as I don't know how to view current active thread)..
Whatever, the problem is solved by this solution.
EDIT
My final solution have delay too. It takes some amount of time to call Thread.start()..
The real final solution I choosed is running infinite loop in thread and monitor some variables of that thread to determine when to save pickle. Just using infinite loop in thread takes a lots of cpu, so I added time.sleep(0.1) to decrease the cpu usage.
FINAL EDIT
OK..My 'real final solution' also had delay..
Even though I moved dumping job to another QThread, the main thread still have delay about pickle dumping time! That was weird.
But I found the reason. The reason was neither emit() performance nor whatever I thought.
The reason was, embarrassingly, python's Global Interpreter Lock prevents two threads in the same process from running Python code at the same time.
So probably I should use multiprocessing module in this case.
I'll post the result after modifying my code to use multiprocessing module.
Edit after using multiprocessing module and future attempts
Using multiprocessing module
Using multiprocessing module solved the issue of running python code concurrently, but the new essential problem arised. The new problem was 'passing shared memory variables between processes takes considerable amount of time' (in my case, passing deque object to child process took 1~2 seconds). I found that this problem cannot be removed as long as I use multiprocessing module. So I gave up to use `multiprocessing module
Possible future attempts
1. Doing only File I/O in QThread
The essential problem of pickle dumping is not writing to file, but serializing before writing to file. Python releases GIL when it writes to file, so disk I/O can be done concurrently in QThread. The problem is, serializing deque object to string before writing to file in pickle.dump method takes some amount of time, and during this moment, main thread is going to be blocked because of GIL.
Hence, following approach will effectively decrease the length of delay.
We somehow stringify the data object every time when onData() is called and push it to deque object
In PickleDumpingThread, just join the list(deque) object to stringify the deque object.
file.write(stringified_deque_object). This can be done concurrently.
The step 1 takes really small time so it almost non-block the main thread.
The step 2 might take some time, but it obviously takes smaller time than serializing python object in pickle.dump method.
The step 3 doesn't block main thread.
2. Using C extension
We can manually release the GIL and reacquire the GIL in our custom C-extension module. But this might be dirty.
3. Porting CPython to Jython or IronPython
Jython and IronPython are other python implementations using Java and C#, respectively. Hence, they don't use GIL in their implementation, which means that thread really works like thread.
One problem is PyQt is not supported in these implementations..
4. Porting to another language
..
Note:
json.dump also took 1~2 seconds for my data.
Cython is not an option for this case. Although Cython has with nogil:, only non-python object can be accessed in that block (deque object cannot be accessed in that block) and we can't use pickle.dump method in that block.