Redis基本数据结构-2

压缩列表

简述

压缩列表的本质，是一个数组，只不过在这个数组中，存储了一些数据，指向上一个数据块的下标

实际上，也可以认为它是一个链表，然后把这条链表数组化了

数据结构

zlbytes

4字节
zltail

4 bytes
zllen

2 len, max 65535
entryX

压缩列表的元素，可以是字节数组或者整型
zlend

1个字节，magic num, 0xFF

entry的数据结构

previous_entry_length

表示前一个entry的字节长度，1个或者5个字节，当前一个的长度小于254，1个字节，否则5个字节(此时第一个字节恒为0xFE，剩余4个字节表示长度)

当存储0~12的整数，content被合并进encoding

zlentry

/* We use this function to receive information about a ziplist entry.
 * Note that this is not how the data is actually encoded, is just what we
 * get filled by a function in order to operate more easily. */
typedef struct zlentry {
    unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/
    unsigned int prevrawlen;     /* Previous entry len. */
    unsigned int lensize;        /* Bytes used to encode this entry type/len.
                                    For example strings have a 1, 2 or 5 bytes
                                    header. Integers always use a single byte.*/
    unsigned int len;            /* Bytes used to represent the actual entry.
                                    For strings this is just the string length
                                    while for integers it is 1, 2, 3, 4, 8 or
                                    0 (for 4 bit immediate) depending on the
                                    number range. */
    unsigned int headersize;     /* prevrawlensize + lensize. */
    unsigned char encoding;      /* Set to ZIP_STR_* or ZIP_INT_* depending on
                                    the entry encoding. However for 4 bits
                                    immediate integers this can assume a range
                                    of values and must be range-checked. */
    unsigned char *p;            /* Pointer to the very start of the entry, that
                                    is, this points to prev-entry-len field. */
} zlentry;

prevrawlensize

previous_entry_length字段的长度，1还是5
prevrawlen

previous_entry_length表示的数据
lensize

encoding字段的长度
len

数据(content)的长度
headersize

previous_entry_length+encoding字段的总长度
encoding

encoding的编码

创建

对zlbytes、zltail等初始化，共分配11个字节

/* Create a new empty ziplist. */
unsigned char *ziplistNew(void) {
    unsigned int bytes = ZIPLIST_HEADER_SIZE+ZIPLIST_END_SIZE;
    unsigned char *zl = zmalloc(bytes);
    ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);
    ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);
    ZIPLIST_LENGTH(zl) = 0;
    zl[bytes-1] = ZIP_END;
    return zl;
}

插入

将元素编码
重新分配空间
复制数据

由于底层是数组形式，这就导致了，只要发生插入，就需要重新分配空间，(为新数据腾出地方)并且复制剩余的数据

有几种基本的情况

P0/P2

很明显，直接插入
P1

找到位置，扩容(realloc)，move，插入，更新entryX+1的previous_entry_length

删除

插入的反向操作

删除，move，realloc

遍历

支持从前往后遍历，和从后往前遍历

从前往后

previous_entry_length占1个或5个字节(5个的时候第一个字节是0xFE)

encoding，1-5个字节，且看前四个比特就可以区分，同时从encoding里也可以获取content的长度

这意味着可以从前往后遍历
从后往前

根据previous_entry_length就可以实现

连锁更新

当发生插入到两个entry中间，或者从中间删除的时候，

就导致下一个entry的previous_entry_length要更新，一种可能的情况是，下一个previous_entry_length原本占用一个字节，然后插入数据1000的长度，导致要扩充到5个字节，从而引起下一个entry的总长度的变化

对于下图的情况，将导致后面所有的entry都要更新(不过这种概率很低，redis没有为避免这种情况采取措施)

字典

redis字节实现了一套字典(hash)，用的是链表定址法防止冲突

redis是靠数组，hash计算出一个index

数据结构

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

typedef struct dictht {
    dictEntry **table;
    //table大小
    unsigned long size;
    // 掩码= size - 1
    // 计算hash后的index，index = hash % sizemask
    unsigned long sizemask;
    // 用了多少
    unsigned long used;
} dictht;

typedef struct dict {
    dictType *type;
    void *privdata; //私有数据
    dictht ht[2];
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    unsigned long iterators; /* number of iterators currently running */
} dict;

dictht

hash数组
dictEntry

存放key，一个union，如果是数字，可以直接存数据，否则存指针

next是当hash冲突时，存放下一个entry的

dictType，存放hash的callback

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;

dictht ht[2]，有两个，两个的目的是扩容实现摊还复制的

创建

没什么好看的，创建dict的数据结构，table都是NULL

添加元素

dictFind检查key是否存在，存在直接覆盖
dictAdd加入键值对

/* Add an element to the target hash table */
int dictAdd(dict *d, void *key, void *val)
{
    //如果key存在返回NULL，不存在创建一个entry并返回
    dictEntry *entry = dictAddRaw(d,key,NULL);

    if (!entry) return DICT_ERR;
    dictSetVal(d, entry, val);
    return DICT_OK;
}

可以看到dictAddRaw，根据hash计算出index，并且将entry放到table的index上

dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing)
{
    long index;
    dictEntry *entry;
    dictht *ht;

    if (dictIsRehashing(d)) _dictRehashStep(d);

    /* Get the index of the new element, or -1 if
     * the element already exists. */
    if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1)
        return NULL;

    /* Allocate the memory and store the new entry.
     * Insert the element in top, with the assumption that in a database
     * system it is more likely that recently added entries are accessed
     * more frequently. */
    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    entry = zmalloc(sizeof(*entry));
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;

    /* Set the hash entry fields. */
    dictSetKey(d, entry, key);
    return entry;
}

扩容

当hash table超过装载因子后，就会扩容

扩容发生在_dictKeyIndex中

static long _dictKeyIndex(dict *d, const void *key, uint64_t hash, dictEntry **existing)
{
    unsigned long idx, table;
    dictEntry *he;
    if (existing) *existing = NULL;

    /* Expand the hash table if needed */
    if (_dictExpandIfNeeded(d) == DICT_ERR)
        return -1;
    for (table = 0; table <= 1; table++) {
        idx = hash & d->ht[table].sizemask;
        /* Search if this slot does not already contain the given key */
        he = d->ht[table].table[idx];
        while(he) {
            if (key==he->key || dictCompareKeys(d, key, he->key)) {
                if (existing) *existing = he;
                return -1;
            }
            he = he->next;
        }
        if (!dictIsRehashing(d)) break;
    }
    return idx;
}

通过_dictExpandIfNeeded，判断是否需要扩容

扩容后字典和掩码都会发生变化

redis采用的渐进式rehash，rehash后，会把rehashidx标记为-1

刚开始使用ht[0]，扩容后，对于旧数据，并不是一次性复制到扩容后的数据中，而是采用如下策略：

新数据将被加入到ht[1]中
旧数据被访问的时候，从ht[0]中删除，放到ht[1]中
所有旧数据从ht[0]中删除，swap ht[0], ht[1]

摊还分析，时间复杂度是$O(1)$

查找

从两个table里一次查找

dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    uint64_t h, idx, table;

    if (d->ht[0].used + d->ht[1].used == 0) return NULL; /* dict is empty */
    if (dictIsRehashing(d)) _dictRehashStep(d);
    h = dictHashKey(d, key);
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        while(he) {
            if (key==he->key || dictCompareKeys(d, key, he->key))
                return he;
            he = he->next;
        }
        // 如果没有rehash，直接返回
        if (!dictIsRehashing(d)) return NULL;
    }
    return NULL;
}

删除

删除会导致缩容，反向扩容，也是调用dictExpand，和扩容操作一致

全遍历

遍历的期间，可能会对字典进行增删改查，从而导致rehash，有可能发生ht[0]读了一遍数据，rehash后ht[1]又读了一遍数据

redis提供了迭代器保证遍历的正确性

typedef struct dictIterator {
    dict *d;
    long index;
    int table, safe;
    dictEntry *entry, *nextEntry;
    /* unsafe iterator fingerprint for misuse detection. */
    long long fingerprint;
} dictIterator;

普通迭代器

fingerprint记录当前字典的状态，之后迭代都检查fingerprint是否发生变化，如果变化了，则抛异常

普通迭代器需要严格确保字典没有发生变化，它只遍历数据

dictNext
安全迭代器

遍历的同时可以删除数据

通过dictNext遍历，安全迭代器限制了rehash(暂停了rehash)，来确保字典的正确性

这两种都是全遍历

间接遍历

dictScan，这个API是可以再遍历的时候进行rehash的

有三种情况

1. 迭代开始到结束，没有rehash
1. 迭代开始到结束，进行扩容或者缩容操作，且敲好为两次迭代间隔期间完成了rehash操作
1. 迭代开始到结束，某次或某几次迭代，进行了rehash操作

第一种，只需要遍历ht[0]

第二种，这个意思是，ht[0]原先size是4，resha完成，ht[0]被换成了8

redis采用了一个很神奇的算法

v |= ~m0;

/* Increment the reverse cursor */
v = rev(v);
v++;
v = rev(v);

简单说一下，就比如，size是4(sizemash=3)，已经遍历了0和2，然后扩容到8(sizemask=7)，那么，经过rehash后，游标为0、2的数据，可能会分布到0|4、2|6上，那么扩容后的4，6就不需要在地带了

第三种，没rehash完，遍历ht[0]和ht[1]

Tags: redis源码阅读数据库

← Redis基本数据结构-1 Redis基本数据结构-3 →

扫描二维码，分享此文章