Threads are the smallest unit of execution that an operating system can schedule inside a process.
People (me) get confused about what threads are and what processes are. This article will talk about threading as a programming concept and not the theory behind processes and threads. Nevertheless, we’ll talk about processes and threads too.
Processes and threads
Process A
├── Thread 1
├── Thread 2
└── Thread 3
Process B
├── Thread 1
└── Thread 2
Process is an independent program instance. It has its own memory and resources. Think of it like an agent.
A process can have one or more threads in it. Threads share the process memory and resources, though each thread has its own execution state and stack.
How processes and threads are implemented depends on specific programming languages and operating systems, so I highly recommend checking out the wiki.
Click here for a quick summary.
Now, using threads we can speed up some programs, especially when the work is I/O-bound. We can ask independent threads to do independent tasks, given that we take care of the race conditions.
Here’s what we will do.
We will scrape 10 websites, with intentional latency and we will get a response. And we will time them.
And then we will do them again using threads this time instead.
In two different languages.
NOTE: Python, C++. There was no need to do two languages because, as we will see, they yield approximately the same time for this I/O-bound example. Python and C++ threads do not generally have the same performance for every kind of workload.
Setup
We have a list of websites. The list of websites:
sites = [
"https://httpbin.org/delay/2",
"https://httpbin.org/delay/2",
"https://httpbin.org/delay/2",
"https://httpbin.org/delay/2",
"https://httpbin.org/delay/2",
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/1",
"https://en.wikipedia.org/wiki/Thread_(computing)",
"https://en.wikipedia.org/wiki/Global_interpreter_lock",
]
- We will sequentially request a response from each website
- Use threads to do them for us.
- Measure time difference
Here’s the Python code to do it.
Python
start = time.time()
count_seq, failed_seq = 0, 0
for n in sites:
try:
x = requests.get(n, timeout=10)
# print(x.status_code)
except Exception as e:
print(f"Couldn't fetch from {n}: error {e}")
failed_seq += 1
else:
count_seq += 1
end = time.time()
Code is pretty direct. We measure time and go one-by-one at each site and ask for a response using Python’s requests library.
Everything else is exception handling.
For the threading code:
lock = threading.Lock()
count, failed = 0, 0
def fetch(site):
global count
global failed
try:
response = requests.get(site, timeout=10)
with lock:
count += 1
return response
except Exception as e:
print(f"got error in {site}: error message: {e}")
with lock:
failed += 1
return None
if __name__ == "__main__":
start = time.time()
threads = []
for i in range(len(sites)):
t = threading.Thread(target=fetch, args=(sites[i],))
t.start()
threads.append(t)
for t in threads:
t.join()
end = time.time()
Here we have to understand few things.
- To prevent race conditions, we use a
threading.Lock()object. It locks access to a shared variable such that at a given time only one thread can access/update it.
This will repeat in C++ too, so keep that in mind. - For each site, we’re creating a new thread and giving it a target function
fetch. - So we have n threads for n websites.
The results
(base) threads\ $ python threads_pgm.py
Sequential Data
------------------
time taken without threads : 25.756 seconds
------------------
Threads data
------------------
time taken with threads: 3.787 seconds
We can see a huge speedup.
Each thread, at the same time, goes to a website and comes back with a response. So the 3.7 seconds that we see is approximately the maximum time a thread took to go to a website and get a response, plus some scheduling and network overhead. In sequential it is the sum of time.
Also notice the fetch function. We use with lock to use the lock object.
C++
The process will remain the same.
For sequential, do one site at a time. Using threads, create n threads for n websites.
cout << "---- starting threads ... "<< endl;
count = 0;
failed = 0;
const auto s2 = chrono::high_resolution_clock::now();
vector <thread> threads;
for (int i = 0; i < sites.size(); i++){
thread t(fetch, sites[i]); // init the threads AND start
threads.push_back(move(t)); // threads can't be copied, only moved.
}
for (int i = 0 ; i< threads.size(); i++){
threads[i].join();
}
Full code snippet is here.
The result:
(base) threads\ $ ./scraper
Fetched(without threads) 10 sites in 27.8024 seconds
---- starting threads ...
Fetched using threads 10 sites in 3.72653 seconds
As we can see, it’s about the same as Python for this I/O-bound example. And again the time difference is approximately max(time of individual threads) vs sum(time of each website’s response).
Conclusion
There’s a lot that can be studied after this.
Threads are one of the basic building blocks of concurrent software.
Thanks for reading
~ Aayushya Tiwari
Difference between processes and threads - table summary
| Feature | Process | Thread |
|---|---|---|
| Memory | Separate | Shared |
| Communication | Slow (IPC) | Fast (shared memory) |
| Creation | Heavy | Lightweight |
| Context Switch | Expensive | Cheap |
| Isolation | High | Low |
| Failure Impact | Independent | Affects whole process |
Race condition
A race condition occurs when multiple threads access and modify shared data concurrently, leading to incorrect or unpredictable results due to lack of proper synchronization.
int count = 0;
Thread 1: count++;
Thread 2: count++;
Both threads are trying to update the same variable.
There are ways to fix them.
- Locks: Lock the data, only one thread can work on it at a time.
- Semaphores: synchronization primitives that use a counter to control access to a resource.
- Atomic operations
References
wiki : https://en.wikipedia.org/wiki/Thread_(computing) Arpit Bhayani’s video on threads : https://youtu.be/2PjlaUnrAMQ?si=Rq5JgCZSvKK_GCn1