头条|cassandra 集合类型及底层存储格式介绍
摘要:cassandra的集合类型tourist创建带有复杂cell的user表CREATE TABLE ks.user ( id int PRIMARY KEY, addr map<tcass
cassandra的集合类型tourist创建带有复杂cell的user表CREATE TABLE ks.user ( id int PRIMARY KEY, addr map<t
cassandra的集合类型tourist
创建带有复杂cell的user表
CREATE TABLE ks.user ( id int PRIMARY KEY, addr map<text, frozen<set<text>>>, complex map<text, frozen<map<text, text>>>, listcolumn list<text>, setcolumn set<text>)
插入一些数据后,查询数据如下
cassandra@cqlsh:ks> select * from user; id | addr | complex | listcolumn | setcolumn----+------------------------------------------------+----------------------------------+-----------------+----------------- 1 | {'bj': {'ba', 'bb'}, 'shanghai': {'sa', 'sb'}} | {'bj': {'ka': 'va', 'kb': 'vb'}} | ['a', 'b', 'c'] | {'a', 'b', 'c'}
执行bin/nodetool flush,生成sst
查看sst,文本输出
tools/bin/sstabledump /data/data2/ks/user-a92ce790a8ff11e99a3d8963a5d3f9b4/md-17-big-Data.db[{ "partition" : { "key" : [ "1" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 233, "liveness_info" : { "tstamp" : "2019-07-18T02:43:36.011497Z" }, "cells" : [ { "name" : "addr", "deletion_info" : { "marked_deleted" : "2019-07-18T02:43:36.011496Z", "local_delete_time" : "2019-07-18T02:43:36Z" } }, { "name" : "addr", "path" : [ "bj" ], "value" : ["ba", "bb"] }, { "name" : "addr", "path" : [ "shanghai" ], "value" : ["sa", "sb"] }, { "name" : "complex", "deletion_info" : { "marked_deleted" : "2019-07-18T02:47:55.888562Z", "local_delete_time" : "2019-07-18T02:47:55Z" } }, { "name" : "complex", "path" : [ "bj" ], "value" : {"ka": "va", "kb": "vb"}, "tstamp" : "2019-07-18T02:47:55.888563Z" }, { "name" : "listcolumn", "deletion_info" : { "marked_deleted" : "2019-07-18T02:56:09.386468Z", "local_delete_time" : "2019-07-18T02:56:09Z" } }, { "name" : "listcolumn", "path" : [ "982493f0-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "a", "tstamp" : "2019-07-18T02:56:09.386469Z" }, { "name" : "listcolumn", "path" : [ "982493f1-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "b", "tstamp" : "2019-07-18T02:56:09.386469Z" }, { "name" : "listcolumn", "path" : [ "982493f2-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "c", "tstamp" : "2019-07-18T02:56:09.386469Z" }, { "name" : "setcolumn", "deletion_info" : { "marked_deleted" : "2019-07-18T02:55:03.280578Z", "local_delete_time" : "2019-07-18T02:55:03Z" } }, { "name" : "setcolumn", "path" : [ "a" ], "value" : "", "tstamp" : "2019-07-18T02:55:03.280579Z" }, { "name" : "setcolumn", "path" : [ "b" ], "value" : "", "tstamp" : "2019-07-18T02:55:03.280579Z" }, { "name" : "setcolumn", "path" : [ "c" ], "value" : "", "tstamp" : "2019-07-18T02:55:03.280579Z" } ] } ]}
底层的集合是通过cellName+path唯一标记一个元素的。
重点看下addr及complex列, 这两列是嵌套map
{ "name" : "complex", "path" : [ "bj" ], "value" : {"ka": "va", "kb": "vb"}, "tstamp" : "2019-07-18T02:47:55.888563Z" },
但对于子map frozen<map<text, text>> 基本上是当做blob存储的,不能操作map中的子元素,这也是frozen语义。
c*是无主架构,可以多node并发写同一个集合,那如何解决冲突?答案是底层最小存储单元并不是cell,而是cell+path唯一标记的element,依赖于cell&path做单元合并的,以cell timestamp最新作为最终值
删除setcolumn中一个元素
cassandra@cqlsh:ks> update user set setcolumn = setcolumn - {'a'} where id =1;
flush后,查看刚生成的sstable, setcolumn.a写入了一个delete_info
[root@Cassandra8c32GTest005 cassandra]# tools/bin/sstabledump /data/data2/ks/user-a92ce790a8ff11e99a3d8963a5d3f9b4/md-19-big-Data.db[{ "partition" : { "key" : [ "1" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 27, "cells" : [ { "name" : "setcolumn", "path" : [ "a" ], "deletion_info" : { "local_delete_time" : "2019-07-18T10:03:17Z" }, "tstamp" : "2019-07-18T10:03:17.038519Z" } ] } ]}]
做下手工merge,可以发现setcolumn.a value没了,写入了delete_info
bin/nodetool compacttools/bin/sstabledump /data/data2/ks/user-a92ce790a8ff11e99a3d8963a5d3f9b4/md-22-big-Data.db[ { "partition" : { "key" : [ "1" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 236, "liveness_info" : { "tstamp" : "2019-07-18T02:43:36.011497Z" }, "cells" : [ { "name" : "addr", "deletion_info" : { "marked_deleted" : "2019-07-18T02:43:36.011496Z", "local_delete_time" : "2019-07-18T02:43:36Z" } }, { "name" : "addr", "path" : [ "bj" ], "value" : ["ba", "bb"] }, { "name" : "addr", "path" : [ "shanghai" ], "value" : ["sa", "sb"] }, { "name" : "complex", "deletion_info" : { "marked_deleted" : "2019-07-18T02:47:55.888562Z", "local_delete_time" : "2019-07-18T02:47:55Z" } }, { "name" : "complex", "path" : [ "bj" ], "value" : {"ka": "va", "kb": "vb"}, "tstamp" : "2019-07-18T02:47:55.888563Z" }, { "name" : "listcolumn", "deletion_info" : { "marked_deleted" : "2019-07-18T02:56:09.386468Z", "local_delete_time" : "2019-07-18T02:56:09Z" } }, { "name" : "listcolumn", "path" : [ "982493f0-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "a", "tstamp" : "2019-07-18T02:56:09.386469Z" }, { "name" : "listcolumn", "path" : [ "982493f1-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "b", "tstamp" : "2019-07-18T02:56:09.386469Z" }, { "name" : "listcolumn", "path" : [ "982493f2-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "c", "tstamp" : "2019-07-18T02:56:09.386469Z" }, { "name" : "setcolumn", "deletion_info" : { "marked_deleted" : "2019-07-18T02:55:03.280578Z", "local_delete_time" : "2019-07-18T02:55:03Z" } }, { "name" : "setcolumn", "path" : [ "a" ], "deletion_info" : { "local_delete_time" : "2019-07-18T10:03:17Z" }, "tstamp" : "2019-07-18T10:03:17.038519Z" }, { "name" : "setcolumn", "path" : [ "b" ], "value" : "", "tstamp" : "2019-07-18T02:55:03.280579Z" }, { "name" : "setcolumn", "path" : [ "c" ], "value" : "", "tstamp" : "2019-07-18T02:55:03.280579Z" } ] } ] }
试试删除setcolumn整列
update user set setcolumn = null where id =1;
刷sst,执行nodetool compact, 使用dump工具查看,setcolumn之前的子元素全部消失了。
tools/bin/sstabledump /data/data2/ks/user-a92ce790a8ff11e99a3d8963a5d3f9b4/md-27-big-Data.db[ { "partition" : { "key" : [ "1" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 210, "liveness_info" : { "tstamp" : "2019-07-18T02:43:36.011497Z" }, "cells" : [ { "name" : "addr", "deletion_info" : { "marked_deleted" : "2019-07-18T02:43:36.011496Z", "local_delete_time" : "2019-07-18T02:43:36Z" } }, { "name" : "addr", "path" : [ "bj" ], "value" : ["ba", "bb"] }, { "name" : "addr", "path" : [ "shanghai" ], "value" : ["sa", "sb"] }, { "name" : "complex", "deletion_info" : { "marked_deleted" : "2019-07-18T02:47:55.888562Z", "local_delete_time" : "2019-07-18T02:47:55Z" } }, { "name" : "complex", "path" : [ "bj" ], "value" : {"ka": "va", "kb": "vb"}, "tstamp" : "2019-07-18T02:47:55.888563Z" }, { "name" : "listcolumn", "deletion_info" : { "marked_deleted" : "2019-07-18T02:56:09.386468Z", "local_delete_time" : "2019-07-18T02:56:09Z" } }, { "name" : "listcolumn", "path" : [ "982493f0-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "a", "tstamp" : "2019-07-18T02:56:09.386469Z" }, { "name" : "listcolumn", "path" : [ "982493f1-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "b", "tstamp" : "2019-07-18T02:56:09.386469Z" }, { "name" : "listcolumn", "path" : [ "982493f2-a907-11e9-9a3d-8963a5d3f9b4" ], "value" : "c", "tstamp" : "2019-07-18T02:56:09.386469Z" }, { "name" : "setcolumn", "deletion_info" : { "marked_deleted" : "2019-07-18T10:11:06.966534Z", "local_delete_time" : "2019-07-18T10:11:06Z" } } ] } ] }
总结
cassandra 宽表模型是将列打平存储成一个个cell,对于集合类型,相当于把cell再打平成path存储,整个表格相当于是一个双层结构。同时集合cell有自己的deleteTime,下层的path也有自己的deleteTime, ts等。
原文链接:https://yq.aliyun.com/articles/715496?utm_content=g_1000073204
本文为云栖社区原创内容,未经允许不得转载。
- 免责声明
- 世链财经作为开放的信息发布平台,所有资讯仅代表作者个人观点,与世链财经无关。如文章、图片、音频或视频出现侵权、违规及其他不当言论,请提供相关材料,发送到:2785592653@qq.com。
- 风险提示:本站所提供的资讯不代表任何投资暗示。投资有风险,入市须谨慎。
- 世链粉丝群:提供最新热点新闻,空投糖果、红包等福利,微信:juu3644。

币圈观察



