您现在的位置是: 网站首页 > 程序设计  > redis 

redis源码之string对象

2020年8月6日 04:36 1461人围观

简介我们知道redis有5种常见的数据类型,那么这些结构具体在redis里面是怎么使用的?

    redis源码之string对象

    Sds是redis的5种数据类型中最简单的一种,但是redis并不是直接使用sds,redis对给所有类型进行封装,组成一个object,所以说在redis内部流传的是object结构。

    typedef struct redisObject { 
        unsigned type:4; 
        unsigned encoding:4; 
        unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or 
                                * LFU data (least significant 8 bits frequency 
                                * and most significant 16 bits access time). */ 
        int refcount; 
        void *ptr; 
    } robj; 
    

    可以看到,该结构里面包含了typeencoding字段,其中type就是我们通常说的数据类型,包括string、list、hash、set、hashset,但是里面具体是怎么存储的,这就是由encoding字段标识的。

    以最简单的sds为例,源码中用createStringObject来封装一个string类型。其中一段代码是

     /* Create a string object with EMBSTR encoding if it is smaller than 
      * OBJ_ENCODING_EMBSTR_SIZE_LIMIT, otherwise the RAW encoding is 
      * used. 
      * 
      * The current limit of 44 is chosen so that the biggest string object 
      * we allocate as EMBSTR will still fit into the 64 byte arena of jemalloc. */ 
    #define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44 
    robj *createStringObject(const char *ptr, size_t len) { 
        if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) 
            return createEmbeddedStringObject(ptr,len); 
        else 
            return createRawStringObject(ptr,len); 
    } 
    

    由此可见,当字符串长度<44时,会创建embstr,那么embstr究竟是什么呢?查看robj结构我们发现,字段ptr指向的是真正是数据区(万能的void *),那么通常的,我们在其他地方创建一个对象,然后用ptr指向它就好了。我们先看rawstring是怎么创建的。

    /* Create a string object with encoding OBJ_ENCODING_RAW, that is a plain 
     * string object where o->ptr points to a proper sds string. */ 
    robj *createRawStringObject(const char *ptr, size_t len) { 
        return createObject(OBJ_STRING, sdsnewlen(ptr,len)); 
    } 
    
    ... ... 
    
    robj *createObject(int type, void *ptr) { 
        robj *o = zmalloc(sizeof(*o)); 
        o->type = type; 
        o->encoding = OBJ_ENCODING_RAW; 
        o->ptr = ptr; 
        o->refcount = 1; 
    
        /* Set the LRU to the current lruclock (minutes resolution), or 
         * alternatively the LFU counter. */ 
        if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) { 
            o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL; 
        } else { 
            o->lru = LRU_CLOCK(); 
        } 
        return o; 
    } 
    

    的确如我们所料,在创建rawstring的时候先创建了个object,在吧object的ptr指向sds对象,这也是通常的做法。那么embstr又是怎么回事?看看源码吧~

    /* Create a string object with encoding OBJ_ENCODING_EMBSTR, that is 
     * an object where the sds string is actually an unmodifiable string 
     * allocated in the same chunk as the object itself. */ 
    robj *createEmbeddedStringObject(const char *ptr, size_t len) { 
        robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr8)+len+1); 
        struct sdshdr8 *sh = (void*)(o+1); 
    
        o->type = OBJ_STRING; 
        o->encoding = OBJ_ENCODING_EMBSTR; 
        o->ptr = sh+1; 
        o->refcount = 1; 
        if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) { 
            o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL; 
        } else { 
            o->lru = LRU_CLOCK(); 
        } 
    
        sh->len = len; 
        sh->alloc = len; 
        sh->flags = SDS_TYPE_8; 
        if (ptr == SDS_NOINIT) 
            sh->buf[len] = '\0'; 
        else if (ptr) { 
            memcpy(sh->buf,ptr,len); 
            sh->buf[len] = '\0'; 
        } else { 
            memset(sh->buf,0,len+1); 
        } 
        return o; 
    } 
    

    我们看到,在rawstring中创建object时,一下子多余分配了sizeof(struct sdshdr8)+len+1个字节,这么做是为什么呢?哦,原来是利用的局部原理,我们将object和sds放在连续的地址空间上,这样在读取的时候一次就可以加载了,而像rawstring,需要先读取object结构,然后再根据obj->ptr定位到sds的地址,再读取一次。所以说embstr减少了内存的读取次数,将两次操作用一次完成

    为什么OBJ_ENCODING_EMBSTR_SIZE_LIMIT定义是44呢?

    robj占用16个字节,sdshdr8的结构如下,buf是柔性数组,整个结构只占用3个字节。

    struct __attribute__ ((__packed__)) sdshdr8 { 
        uint8_t len; /* used */ 
        uint8_t alloc; /* excluding the header and null terminator */ 
        unsigned char flags; /* 3 lsb of type, 5 unused bits */ 
        char buf[]; 
    }; 
    

    sizeof(robj)+sizeof(struct sdshdr8)+len+1中出了len未知,其余都是已知值,size=16+3+len+1=20+len;当len取44的时候,size=64,其实注释已经说了 we allocate as EMBSTR will still fit into the 64 byte arena of jemalloc.

    最后,列举一下redis中encoding的取值

    /* Objects encoding. Some kind of objects like Strings and Hashes can be 
     * internally represented in multiple ways. The 'encoding' field of the object 
     * is set to one of this fields for this object. */ 
    #define OBJ_ENCODING_RAW 0     /* Raw representation */ 
    #define OBJ_ENCODING_INT 1     /* Encoded as integer */ 
    #define OBJ_ENCODING_HT 2      /* Encoded as hash table */ 
    #define OBJ_ENCODING_ZIPMAP 3  /* Encoded as zipmap */ 
    #define OBJ_ENCODING_LINKEDLIST 4 /* No longer used: old list encoding. */ 
    #define OBJ_ENCODING_ZIPLIST 5 /* Encoded as ziplist */ 
    #define OBJ_ENCODING_INTSET 6  /* Encoded as intset */ 
    #define OBJ_ENCODING_SKIPLIST 7  /* Encoded as skiplist */ 
    #define OBJ_ENCODING_EMBSTR 8  /* Embedded sds string encoding */ 
    #define OBJ_ENCODING_QUICKLIST 9 /* Encoded as linked list of ziplists */ 
    #define OBJ_ENCODING_STREAM 10 /* Encoded as a radix tree of listpacks */