Go语言开发者的Apache Arrow使用指南:内存管理( 二 )


$go run reuse_string_builder.go0000000003|.|0000000003 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000001000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000002000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000003000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000000000 00 00 00 05 00 00 0011 00 00 00 00 00 00 00|................|0000001000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000002000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000003000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000000068 65 6c 6c 6f 61 70 6163 68 65 20 61 72 72 6f|helloapache arro|0000001077 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|w...............|0000002000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000003000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|["hello" "apache arrow"]0000000003|.|0000000003 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000001000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000002000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000003000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000000000 00 00 00 0e 00 00 0017 00 00 00 00 00 00 00|................|0000001000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000002000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000003000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000000068 61 70 70 79 20 62 6972 74 68 64 61 79 6c 65|happy birthdayle|000000106f 20 6d 65 73 73 69 0000 00 00 00 00 00 00 00|o messi.........|0000002000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|0000003000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|["happy birthday" "leo messi"]想必到这里,大家对Arrow的Go实现原理有了一个大概的认知了 。接下来,我们再来看Go arrow实现的内存引用计数管理 。
2. Go Arrow实现的内存引用计数管理在上面图中,我们看到Go Arrow实现的几个主要接口Builder、Array、ArrayData都包含了Release和Retain方法,也就是说实现了这些接口的类型都支持采用引用计数方法(Reference Counting)进行内存的跟踪和管理 。Retain方法的语义是引用计数加1,而Release方法则是引用计数减1 。由于采用了原子操作对引用计数进行加减,因此这两个方法是并发安全的 。当引用计数减到0时,该引用计数对应的内存块就可以被释放掉了 。
Go Arrow实现的主页[3]上对引用计数的使用场景和规则做了如下说明:

  • 如果你被传递了一个对象并希望获得它的所有权(ownership),你必须调用Retain方法 。当你不再需要该对象时,你必须调用对应的Release方法 。"获得所有权"意味着你希望在当前函数调用的范围之外访问该对象 。
  • 你通过名称以New或Copy开头的函数创建的任何对象,或者在通过channel接收对象时,你都将拥有所有权 。因此,一旦你不再需要这个对象,你必须调用Release 。
  • 如果你通过一个channel发送一个对象,你必须在发送之前调用Retain,因为接收者将拥有该对象 。接收者有义务在以后不再需要该对象时调用Release 。
有了这个说明后,我们对于Retain和Release的使用场景基本做到心里有谱了 。但还有一个问题亟待解决,那就是:Go是GC语言,为何还要在GC之上加上一套引用计数呢?
这个问题我在这个issue[4]中找到了答案 。一个Go arrow实现的commiter在回答issue时提到:“理论上,如果你知道你使用的是默认的Go分配器,你实际上不必在你的消费者(指的是Arrow Go包 API的使用者)代码中调用Retain/Release,可以直接让Go垃圾回收器管理一切 。我们只需要确保我们在库内调用Retain/Release,这样如果消费者使用非Go GC分配器,我们就可以确保他们不会出现内存泄漏” 。
下面是默认的Go分配器的实现代码:
package memory// DefaultAllocator is a default implementation of Allocator and can be used anywhere// an Allocator is required.//// DefaultAllocator is safe to use from multiple goroutines.var DefaultAllocator Allocator = NewGoAllocator()type GoAllocator struct{}func NewGoAllocator() *GoAllocator { return &GoAllocator{} }func (a *GoAllocator) Allocate(size int) []byte {buf := make([]byte, size+alignment) // padding for 64-byte alignmentaddr := int(addressOf(buf))next := roundUpToMultipleOf64(addr)if addr != next {shift := next - addrreturn buf[shift : size+shift : size+shift]}return buf[:size:size]}func (a *GoAllocator) Reallocate(size int, b []byte) []byte {if size == len(b) {return b}newBuf := a.Allocate(size)copy(newBuf, b)return newBuf}func (a *GoAllocator) Free(b []byte) {}


推荐阅读