dict
dict的数据定义如下:
1/* This is our hash table structure. Every dictionary has two of this as we
2 * implement incremental rehashing, for the old to the new table. */
3typedef struct dictht {
4 dictEntry **table;
5 unsigned long size;
6 unsigned long sizemask;
7 unsigned long used;
8} dictht;
9
10typedef struct dict {
11 dictType *type;
12 void *privdata;
13 dictht ht[2];
14 long rehashidx; /* rehashing not in progress if rehashidx == -1 */
15 int16_t pauserehash; /* If >0 rehashing is paused (<0 indicates coding error) */
16} dict;
dict 定义了一个hash table对象,包括如下的字段:
字段 | 类型 | 说明 |
---|---|---|
type | dictType* | 这里定义了一组对Key-Value的操作函数 |
privdata | void* | Key-Value操作函数需要用到的自定义数据 |
ht | dictht [2] | 哈希数组,这里定义了两个,用于dict扩容时的rehash |
rehashidx | long | 值为-1时表示不在rehashing,否则记录下一次rehash要操作的hash table下标 |
pauserehash | int16_t | > 0 表示rehashing暂停,< 0 表示rehashing出错 |
数据保存在ht字段中,dictht类型解释如下:
字段 | 类型 | 说明 |
---|---|---|
table | dictEntry** | 哈希表数组,通过dictType中的hashFunction哈希的结果与sizemask进行与操作,获得的既是该数组的下标 |
size | unsigned long | 数组的大小,值一定是2的n次方 |
sizemask | unsigned long | 快速计算哈希值对应的数组下标的掩码,值为size-1 |
used | unsigned long | 节点计数,记录这个hash table里面一共有多少个节点。 |
先上图,来窥一下dict的全貌
先来看dict的创建:
1/* Create a new hash table */
2dict *dictCreate(dictType *type,
3 void *privDataPtr)
4{
5 dict *d = zmalloc(sizeof(*d));
6
7 _dictInit(d,type,privDataPtr);
8 return d;
9}
10
11/* Initialize the hash table */
12int _dictInit(dict *d, dictType *type,
13 void *privDataPtr)
14{
15 _dictReset(&d->ht[0]);
16 _dictReset(&d->ht[1]);
17 d->type = type;
18 d->privdata = privDataPtr;
19 d->rehashidx = -1;
20 d->pauserehash = 0;
21 return DICT_OK;
22}
dictCreate为dict的数据结构分配空间并为各个变量赋初值。其中两个哈希表ht[0]和ht[1]起始都没有分配空间,table指针都赋为NULL。这意味着要等第一个数据插入时才会真正分配空间。
在看dict是如何操作的,我们先看看dict的查找:
1dictEntry *dictFind(dict *d, const void *key)
2{
3 dictEntry *he;
4 uint64_t h, idx, table;
5
6 if (dictSize(d) == 0) return NULL; /* dict is empty */
7 if (dictIsRehashing(d)) _dictRehashStep(d);
8 h = dictHashKey(d, key);
9 for (table = 0; table <= 1; table++) {
10 idx = h & d->ht[table].sizemask;
11 he = d->ht[table].table[idx];
12 while(he) {
13 if (key==he->key || dictCompareKeys(d, key, he->key))
14 return he;
15 he = he->next;
16 }
17 if (!dictIsRehashing(d)) return NULL;
18 }
19 return NULL;
20}
上述dictFind的源码,根据dict当前是否正在重哈希,依次做了这么几件事:
- 如果当前正在进行重哈希,那么将重哈希过程向前推进一步(即调用_dictRehashStep)。实际上,除了查找,插入和删除也都会触发这一动作。这就将重哈希过程分散到各个查找、插入和删除操作中去了,而不是集中在某一个操作中一次性做完。
- 计算key的哈希值(调用dictHashKey,里面的实现会调用前面提到的hashFunction)。
- 先在第一个哈希表ht[0]上进行查找。在table数组上定位到哈希值对应的位置(如前所述,通过哈希值与sizemask进行按位与),然后在对应的dictEntry链表上进行查找。查找的时候需要对key进行比较,这时候调用dictCompareKeys,它里面的实现会调用到前面提到的keyCompare。如果找到就返回该项。否则,进行下一步。
- 判断当前是否在重哈希,如果没有,那么在ht[0]上的查找结果就是最终结果(没找到,返回NULL)。否则,在ht[1]上进行查找(过程与上一步相同)。
rehash是个什么过程呢?还是看代码
1/* This function performs just a step of rehashing, and only if hashing has
2 * not been paused for our hash table. When we have iterators in the
3 * middle of a rehashing we can't mess with the two hash tables otherwise
4 * some element can be missed or duplicated.
5 *
6 * This function is called by common lookup or update operations in the
7 * dictionary so that the hash table automatically migrates from H1 to H2
8 * while it is actively used. */
9static void _dictRehashStep(dict *d) {
10 if (d->pauserehash == 0) dictRehash(d,1);
11}
12
13/* Performs N steps of incremental rehashing. Returns 1 if there are still
14 * keys to move from the old to the new hash table, otherwise 0 is returned.
15 *
16 * Note that a rehashing step consists in moving a bucket (that may have more
17 * than one key as we use chaining) from the old to the new hash table, however
18 * since part of the hash table may be composed of empty spaces, it is not
19 * guaranteed that this function will rehash even a single bucket, since it
20 * will visit at max N*10 empty buckets in total, otherwise the amount of
21 * work it does would be unbound and the function may block for a long time. */
22int dictRehash(dict *d, int n) {
23 int empty_visits = n*10; /* Max number of empty buckets to visit. */
24 if (!dictIsRehashing(d)) return 0;
25
26 while(n-- && d->ht[0].used != 0) {
27 dictEntry *de, *nextde;
28
29 /* Note that rehashidx can't overflow as we are sure there are more
30 * elements because ht[0].used != 0 */
31 assert(d->ht[0].size > (unsigned long)d->rehashidx);
32 while(d->ht[0].table[d->rehashidx] == NULL) {
33 d->rehashidx++;
34 if (--empty_visits == 0) return 1;
35 }
36 de = d->ht[0].table[d->rehashidx];
37 /* Move all the keys in this bucket from the old to the new hash HT */
38 while(de) {
39 uint64_t h;
40
41 nextde = de->next;
42 /* Get the index in the new hash table */
43 h = dictHashKey(d, de->key) & d->ht[1].sizemask;
44 de->next = d->ht[1].table[h];
45 d->ht[1].table[h] = de;
46 d->ht[0].used--;
47 d->ht[1].used++;
48 de = nextde;
49 }
50 d->ht[0].table[d->rehashidx] = NULL;
51 d->rehashidx++;
52 }
53
54 /* Check if we already rehashed the whole table... */
55 if (d->ht[0].used == 0) {
56 zfree(d->ht[0].table);
57 d->ht[0] = d->ht[1];
58 _dictReset(&d->ht[1]);
59 d->rehashidx = -1;
60 return 0;
61 }
62
63 /* More to rehash... */
64 return 1;
65}
- dict先去判断步数已经走完,之后再判断是否表中所有的项都已经rehash过了。不满足则进行一次rehash。
- rehash的过程很简单,主要分为两步:
- 根据rehashindex找到要准备rehash的节点
- 遍历节点中的所有项,并将其插入到ht[1]中,之后再从ht[0]中删除掉。
- 如果rehash做完了(used==0)则ht[0] = ht[1] 并 重置 ht[1]
那么dict如何新增一条数据呢?我们可以看下dictAdd和dictReplace的代码
1/* Add an element to the target hash table */
2int dictAdd(dict *d, void *key, void *val)
3{
4 dictEntry *entry = dictAddRaw(d,key,NULL);
5
6 if (!entry) return DICT_ERR;
7 dictSetVal(d, entry, val);
8 return DICT_OK;
9}
10
11/* Low level add or find:
12 * This function adds the entry but instead of setting a value returns the
13 * dictEntry structure to the user, that will make sure to fill the value
14 * field as they wish.
15 *
16 * This function is also directly exposed to the user API to be called
17 * mainly in order to store non-pointers inside the hash value, example:
18 *
19 * entry = dictAddRaw(dict,mykey,NULL);
20 * if (entry != NULL) dictSetSignedIntegerVal(entry,1000);
21 *
22 * Return values:
23 *
24 * If key already exists NULL is returned, and "*existing" is populated
25 * with the existing entry if existing is not NULL.
26 *
27 * If key was added, the hash entry is returned to be manipulated by the caller.
28 */
29dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing)
30{
31 long index;
32 dictEntry *entry;
33 dictht *ht;
34
35 if (dictIsRehashing(d)) _dictRehashStep(d);
36
37 /* Get the index of the new element, or -1 if
38 * the element already exists. */
39 if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1)
40 return NULL;
41
42 /* Allocate the memory and store the new entry.
43 * Insert the element in top, with the assumption that in a database
44 * system it is more likely that recently added entries are accessed
45 * more frequently. */
46 ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
47 entry = zmalloc(sizeof(*entry));
48 entry->next = ht->table[index];
49 ht->table[index] = entry;
50 ht->used++;
51
52 /* Set the hash entry fields. */
53 dictSetKey(d, entry, key);
54 return entry;
55}
56
57/* Add or Overwrite:
58 * Add an element, discarding the old value if the key already exists.
59 * Return 1 if the key was added from scratch, 0 if there was already an
60 * element with such key and dictReplace() just performed a value update
61 * operation. */
62int dictReplace(dict *d, void *key, void *val)
63{
64 dictEntry *entry, *existing, auxentry;
65
66 /* Try to add the element. If the key
67 * does not exists dictAdd will succeed. */
68 entry = dictAddRaw(d,key,&existing);
69 if (entry) {
70 dictSetVal(d, entry, val);
71 return 1;
72 }
73
74 /* Set the new value and free the old one. Note that it is important
75 * to do that in this order, as the value may just be exactly the same
76 * as the previous one. In this context, think to reference counting,
77 * you want to increment (set), and then decrement (free), and not the
78 * reverse. */
79 auxentry = *existing;
80 dictSetVal(d, existing, val);
81 dictFreeVal(d, &auxentry);
82 return 0;
83}
其实过程也很简单:
-
首先使用dictAddRaw添加一个新节点,若节点已存在则返回existing。
- 先进行一步rehash
- 通过调用_dictKeyIndex来获取插入key的index,在没有rehash的情况下会触发表扩容,而扩容总是成倍增加hashtable的长度。
- 由这行代码可知,如果正在rehashing则会把所有的数据都添加到ht[1]中,否则就插入到ht[0]
1ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
-
使用dictSetVal赋值,这是个宏,扩展后如下:
1#define dictSetVal(d, entry, _val_) do { \ 2 if ((d)->type->valDup) \ 3 (entry)->v.val = (d)->type->valDup((d)->privdata, _val_); \ 4 else \ 5 (entry)->v.val = (_val_); \ 6} while(0)
-
新增的dictEntry总是被插入到链表的头部,我想这应该是单向链表最快的插入方式。有文章说是因为新插入数据更容易被访问,我并不认同。
-
dictReplace还会使用dictFreeVal对原值进行释放。
1auxentry = *existing; 2dictSetVal(d, existing, val); 3dictFreeVal(d, &auxentry);
另外我们还需要关注一下dict扩充hash table的条件。来看下dict的扩充函数
1/* Expand the hash table if needed */
2static int _dictExpandIfNeeded(dict *d)
3{
4 /* Incremental rehashing already in progress. Return. */
5 if (dictIsRehashing(d)) return DICT_OK;
6
7 /* If the hash table is empty expand it to the initial size. */
8 if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);
9
10 /* If we reached the 1:1 ratio, and we are allowed to resize the hash
11 * table (global setting) or we should avoid it but the ratio between
12 * elements/buckets is over the "safe" threshold, we resize doubling
13 * the number of buckets. */
14 if (d->ht[0].used >= d->ht[0].size &&
15 (dict_can_resize ||
16 d->ht[0].used/d->ht[0].size > dict_force_resize_ratio) &&
17 dictTypeExpandAllowed(d))
18 {
19 return dictExpand(d, d->ht[0].used + 1);
20 }
21 return DICT_OK;
22}
可以看到的是,只要used > size则会触发扩容,也就是说redis不会让一个ht的索引下挂载过多的节点。因为列表遍历还是比hash来的要慢一点。
dict的数据结构和算法相对来说都是比较简单的,但是其在redis中却占据了重要的位置,很多地方都可以看到对dict的使用。
评论