Casual Talk
Golang is quite enjoyable to write, aside from the tedious if err != nil
checks. One of the fundamental reasons for the joy is goroutine, a core feature of Golang. Understanding goroutines in detail is worthwhile, as they contribute significantly to the pleasure of working with Golang. So, let's talk about goroutines, hoping to provide some insights to the readers.
TL;DR: We'll start by talking about assembling a computer, then delve into some concepts of the operating system, such as processes and threads, before exploring goroutines themselves.
Assembling a Computer
Let's focus on desktop computers, not laptops. Think about the common components of a desktop computer: a monitor, keyboard, mouse, and a central unit. Now, let's take a closer look at the central unit, which typically consists of several essential components: motherboard, CPU, graphics card (enthusiasts usually opt for a dedicated graphics card, while budget users often rely on the integrated graphics card in the CPU :( ), hard drive, and memory modules.
Let's pay attention to the CPU and memory modules. Starting with a schematic diagram of the CPU:
How does the CPU work? The CPU is like a simpleton; it has a fixed instruction set. We pick some instructions from the set, arrange them, and feed them to the CPU. It obediently executes them, moving east when told and west when instructed. The CPU consists of two main components: the arithmetic logic unit (ALU) and the control unit. These two components comprise various specialized registers.
Let's consider the operation 1 + 1 = 2
:
- Retrieve 1 from memory and store it in register A.
- Retrieve another 1 from memory and store it in register B.
- Perform the addition operation and write the result to register A.
- Copy the result in register A to the memory location where the result is stored.
Now, let's look at string concatenation, like "hello" + "world" = "helloworld"
:
- Retrieve the string "hello" from memory and calculate its length (5).
- Retrieve the string "world" from memory and calculate its length (5).
- Add the lengths and allocate a suitable-sized memory block (somewhere).
- Copy "hello" to the first 5 slots of the allocated memory (somewhere).
- Copy "world" to the slots 6-10 of the allocated memory (somewhere).
- Return the address of the allocated memory (somewhere).
Is this exactly how the CPU performs these operations? In terms of the specific low-level details, not exactly, as it deals with chunks of 0s and 1s. However, abstractly, yes. All von Neumann architecture systems, including specific CPU architectures like Intel and ARM, as well as virtual machines like the JVM and the massive switch...case..
. in Python's eval.c
(since a virtual machine simulates a physical CPU), follow a similar process.
Now that we understand how the CPU works, let's explore some concepts of the operating system.
Installing an Operating System
Time to install the system; it's nerve-wracking. After a few steps, Linux is up and running. But why do we need an operating system? Without it, the life of a programmer would be challenging, involving tasks such as:
- Studying the instruction set of the purchased CPU and reading the manual.
- Inspecting the first 512 bytes of the hard drive to understand the partitioning.
- Locating the desired program starting at 1024 bytes and loading it into memory.
- Realizing the memory is too small, loading only part of the program.
- Attempting a simple operation like 1+1=2, but accidentally overwriting another program.
- Rebooting and starting over.
Fortunately, over the past few decades, great programmers have developed operating systems, sparing us from such difficulties.
As mentioned earlier, our programs reside on the hard drive. To run them, we must read the program from the hard drive into memory and then execute specific instructions. If I were to write an operating system, I would organize all the information of a program in one place for efficient management. It would look something like this:
type process struct {
instructions unsafe.Pointer;
instruction_size int64;
current_offset int64; // offset of current executing instruction
};
This is a process. Let's take a look at the memory layout of a process in Linux:
However, later people found that processes seemed a bit too large. Thus, the concept of threads emerged. Here is the memory layout of a thread in Linux:
We will only explain the memory layout of a process; readers can understand the memory layout of a thread on their own. Looking at the memory layout diagram of a process, there are several concepts:
text
: These are the instructions mentioned earlier; it's our code, the instructions generated after compilation.data
: This includes variables initialized in the code, for example,int i = 1
.bss
: This is for variables in the code that are not initialized, for example,int a
.heap
: We'll talk about the heap in a moment.stack
: We'll discuss the stack shortly.
First, let's think about why we need a stack. The characteristic of the stack data structure is last in, first out (LIFO), right? This is exactly how function calls work, isn't it? For example:
func foo() {
println("foo")
}
func main() {
foo()
}
The execution sequence of this code is undoubtedly that main
is executed first, followed by foo
, then println
, println
returns, foo
returns, and finally, since there is no other code in main
, it returns.
Do you see it? main is the function called first, but it is the last to return (for now, let's temporarily ignore other code in the runtime). In the function call stack, each function call is placed in a data structure called a frame.
Let's reconstruct the above function call process (in terms of its representation in memory):
First, a function in the runtime calls the main function:
| runtime | main
Then, the main function calls foo:
| runtime | main | foo
Next, the foo function calls println:
| runtime | main | foo | println
After that, println returns:
| runtime | main | foo
Following that, foo returns:
| runtime | main
Finally, main returns:
| runtime
As you can see, after a function returns, the variables and other data that were present in the function are no longer there (technically, they still exist, but typically, well-behaved programs do not access them). So, what if we have something we don't want to lose after a function returns? In other words, we want some data to be independent of the function call lifecycle. This is where our heap comes into play.
The heap exists for this purpose. What kind of memory allocated by code ends up in the heap? If it's C code, memory allocated by malloc ends up in the heap. If it's Golang, after escape analysis, something like:
func open(name string) (*os.File, error) {
f := &File{}
return f
}
The function f
will be allocated on the heap. Why? If f
were allocated on the stack, it would be gone after the open function returns. How could the caller happily continue using it? Understanding this, let's now take a look at goroutines.
Goroutine
Golang was initially written in C and later implemented its own bootstrap in Golang. Now, let's think, if we were to implement goroutines ourselves, how would we do it?
Hold your horses! What is a goroutine? We haven't clarified that yet. A goroutine in Go is a coroutine. In simple terms, a coroutine is the smallest unit of execution scheduled by the user rather than the operating system. Threads, as we know, are the smallest units scheduled by the operating system. Coroutines are the smallest units of execution manually scheduled by programmers. They resemble function calls, but the key difference is that multiple coroutines can save their state, whereas in function calls, once a function exits, the state is lost.
So, we certainly can't use the stack allocated by the operating system, right? But functions must have a stack for execution. Therefore, we allocate a block of memory in the heap and use it as a stack. That should work, right? This is why the structure of a goroutine looks like this:
type g struct {
// Stack parameters.
// stack describes the actual stack memory: [stack.lo, stack.hi).
// stackguard0 is the stack pointer compared in the Go stack growth prologue.
// It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption.
// stackguard1 is the stack pointer compared in the C stack growth prologue.
// It is stack.lo+StackGuard on g0 and gsignal stacks.
// It is ~0 on other goroutine stacks, to trigger a call to morestackc (and crash).
stack stack // offset known to runtime/cgo
stackguard0 uintptr // offset known to liblink
stackguard1 uintptr // offset known to liblink
_panic *_panic // innermost panic - offset known to liblink
_defer *_defer // innermost defer
m *m // current m; offset known to arm liblink
sched gobuf
...
}
Golang implements goroutines using the GMP architecture.
As mentioned earlier, von Neumann architecture relies on these elements. Things like stacks and heaps are necessary for implementing state transitions. Since a goroutine is the smallest unit that a programmer can schedule, it should have its own stack and heap. The heap can be shared among various threads in a process, but each thread has its own stack, preventing shared usage. Therefore, we allocate a block of memory in the heap to serve as the stack for goroutines. That's the essence of goroutines.
References
- 杂谈
- The Linux Programming Interface
- Advanced Programming in UNIX Environment
- https://zh.wikipedia.org/wiki/%E4%B8%AD%E5%A4%AE%E5%A4%84%E7%90%86%E5%99%A8
- https://software.intel.com/zh-cn/articles/book-processor-architecture_cpu_function_and_composition
- https://jiajunhuang.com/articles/2018_02_02-golang_runtime.md.html