Introduction
In Python, if you run some calculation with multithread you will find that it takes almost the same time, no matter how many thread you use. For example for the code bellow, it just do a plus operation in a for loop, with 1 thread or 4 thread, in my 4 core cpu laptop, all took almost 4 seconds.
import threading
import time
from datetime import datetime
def run(n):
tot = 0
for i in range(0, n):
tot += 1
start = datetime.now()
a = 100000000
n = 10
th = []
for i in range(0,n):
t = threading.Thread(target=run, args=(a//n,))
t.start()
th.append(t)
print(f"Thead {i} started.")
for t in th:
t.join()
end = datetime.now()
print(f"Time: {end - start}")
GIL (Global Interpreter Lock)
What is GIL? GIL is not a feature of python, it is a concept when python implement the python interpreter (CPython). It's a lock to prevent multithread run at the same time. So the multithread actually run in concurrent, not parallel.
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)
Why GIL exist?
Python use reference counting for memory management. So when a object's reference count decreased to zero, the memory occupied by this object is released. Multiple thread in the same process will share same memory space, so the problem was that this reference count variable will have race conditions when two threads increase or decrease its value simultneously and hence it may cause memory leak.
How to Avoid GIL
-
If your program is I/O bound, multithrad and multiprocess have no much different as I/O operation will let current thread has a chance to release GIL, and other non I/O thread continue to process.
-
If your program is CPU bound, then use multi process will be a better choice.