Posts

I use arch btw

Story Time I have been using ubuntu for the past 3 years or so, and honestly I love it. When I initially installed ubuntu it was out of necessity. I had an old laptop and I wanted to do machine learning, which at that time required a GPU. Today you can just use cloud GPUs and pretty decent prices at that! Since ubuntu is the most light operating system and beginner friendly, I thought that’d be a decent choice. ...

LSTM from scratch in PyTorch

Today we’re looking at LSTM (Long Short Term Memory) neural networks. The standard for sequential data was RNNs (Recurrent Neural Networks). RNNs had an issue. They were good at remembering things, but they… well, they kept forgetting too! They had what we call a vanishing gradient problem. Small example to guide your visual senses: The spine of the problem is, when the gradients flow backwards, they get multiplied. And say if that number that we’re multiplying with is smaller - which a lot of times the gradients are, then at some point the gradient will be close to zero and boom you’ve lost signal. At each timestep, the gradient gets multiplied by the derivative of the activation function (like sigmoid) — a number that maxes out at 0.25. Ten steps back, your gradient is 0.25¹⁰ — essentially zero. ...

Implementing mini Ollama from scratch

In my previous work, we discussed different ideas and ways in writing a server. To quickly recap, it’s writing it asychronously, writing a multi-threaded server or writing a “dumb” one. Each one has its pros and cons. People might say writing an async server would solve issues but it would not be the complete story. An async server helps when threads are idle waiting on I/O — network reads, file reads from external devices. Things like this, you can build an async server and it’ll get you the fastest latency. But they don’t provide us with the same improvements when the work is not I/O bound. Forge’s threads are idle waiting on compute — a different problem. We’ll see why in this article. ...

Writing an inference server in bare-metal C++

What is a Server? It is a software program that manages resources over a network. The whole network could be visualized like this: Today we are writing a server. A dumb one, at first. We’ll benchmark the load handling of the server. And then we optimize. Difference between websockets and a http server model WebSockets maintain a persistent, stateful connection where both client and server can continuously exchange data. In a traditional HTTP model, the client usually sends a request, receives a response, and the connection is then closed. ...

Studying threads and benchmarking python vs c++

Threads are the smallest unit of execution that an operating system can schedule inside a process. People (me) get confused about what threads are and what processes are. This article will talk about threading as a programming concept and not the theory behind processes and threads. Nevertheless, we’ll talk about processes and threads too. Processes and threads Process A ├── Thread 1 ├── Thread 2 └── Thread 3 Process B ├── Thread 1 └── Thread 2 Process is an independent program instance. It has its own memory and resources. Think of it like an agent. A process can have one or more threads in it. Threads share the process memory and resources, though each thread has its own execution state and stack. How processes and threads are implemented depends on specific programming languages and operating systems, so I highly recommend checking out the wiki. ...

From scratch Quantization in neural networks

“compute solves a lot of problem” If we just had enough compute, a lot of the problems that we experience today would be solved. Loger contexts, smarter weights and biases, etc. But right now we don’t have infinite compute. That’s the sad reality. So we optimize. Quantization is our attempt at just that. history Reference: https://arxiv.org/abs/2103.13630 Quantization is a way of compression. It is a process of mapping a large set of continous or high-precision values into smaller discrete set of values. ...

How PolyBlocks lowers tensor programs for GPUs

Recently, at PyTorch Day India in Bangalore, I saw a talk on AI compilers. Here is the link: YouTube Picture from the session I didn't know there were Indian labs working on the AI compiler problem. But it turns out there are. PolyMage Labs is an IISc lab in Bangalore working on PolyBlocks. Since AI is moving fast, there is a clear need for efficient AI compilers that can lower high-level tensor programs to IR for GPUs, TPUs, and other backends. PolyBlocks minimizes dependency on external vendor libraries like cuBLAS/cuDNN while still generating highly optimized code via compiler-driven transformations and tiling. ...

Building a terminal image viewer over SSH

Some days back I was studying for computer networks exam. I came across few protocols which were very interesting. Like SMTP (Simple Mail Transfer Protocol), telnet, SCP (Secure Copy Protocol) just to name a few. SMTP and a little bit of theory Simple Mail Transfer Protocol is a protocol used to transfer mails over servers. It was written in 1981. IT works on port number 25. Since SMTP is server-to-server, the client port number is 587. ...

Writing robust Python CLIs with argparse

In this article I’d like to introduce you to a rather useful python library that can be of use to you. It’s called argparse and recently I have been using it as my go to for couple of things. I first got to know about this library when participating in a kaggle comp. It was pretty intimidating at first because you’re not sure what’s going on but after this article I am hoping you’d know how to deal with code that mentions argparse. We’ll also talk about config files and how this library can be used to write config file. ...

Two months of writing code and building things

If you didn’t know, I recently started writing more. Published a new website. Yes, the website you’re reading this at. The reason was to get good at understanding and learning. With the coming of AI, writing code has never been easier. And to be honest, I don’t think AI has any role in this. This was way before AI came. The main thing that drives the world imo is an idea. Ideas and implementations. Now the way we implement things have been changing since ages. The one example I like to think about is of the compilers and assembly programmers when C language came. Pretty sure all of them were in the same position developers today are. But that’s another story. Implementations change, but the most thing that drives technology, sciences, math and all the important stuff, are, as i said, ideas. And to get better ideas, we don’t just need intellect. No. We need creativity, we need people who can understand deeply. Who can think. And I don’t use the word think in a lighter manner. Thinking was never easy. And in today’s world, it’s even harder. Which is why I started writing. Because believe me or not, writing is thinking. Every week I have this essay that I have to think about, learn, and write about. ...