Translog文件损坏
相关文档
故障原因
- 机房突然断电导致
es
的 translog
文件损坏,分片无法分片
- 通过
_cluster/allocation/explain
可以看到具体损坏的 translog
文件
- 出现这种情况如果无法从副本或者快照恢复数据,就意味着可能会丢失部分数据甚至是该分片的全部数据
故障解决
- 官方用处理
translog
文件损坏的工具 elasticsearch-shard
- 该工具会将损坏的部分删除
es$ bin/elasticsearch-shard remove-corrupted-data -h
Removes corrupted shard files
This tool attempts to detect and remove unrecoverable corrupted data in a shard.
Option Description
------ -----------
-E <KeyValuePair> Configure a setting
-d, --dir Index directory location on disk
-h, --help show help
--index Index name
-s, --silent show minimal output
--shard-id <Integer> Shard id
-v, --verbose show verbose output
- 有两种方式处理
translog
文件:
- 一是:指定索引和分片号
- 二是:指定
translog
文件完整路径
1. bin/elasticsearch-shard remove-corrupted-data --index index_name --shard-id num
2. bin/elasticsearch-shard remove-corrupted-data --dir translog_file_path
- 该工具会将损坏的文件列出来,然后让你选择是否
remove
........
--> translog-1799.ckp
--> translog-1799.tlog
--> translog-1800.ckp
--> translog-1800.tlog
--> translog-1801.ckp
--> translog-1801.tlog
--> translog-1802.ckp
--> translog-1802.tlog
--> translog-1803.tlog
--> translog.ckp
WARNING: YOU MAY LOSE DATA.
-----------------------------------------------------------------------
Continue and remove corrupted data from the shard ?
Confirm [y/N] y
Checking existing translog files
POST /_cluster/reroute'
{
"commands" : [
{
"allocate_stale_primary" : {
"index" : "index_name",
"shard" : num,
"node" : "c7aAKt_2QCC0RwZCj9ZL1Q",
"accept_data_loss" : false
}
}
]
}'
You must accept the possibility of data loss by changing parameter `accept_data_loss` to `true`.