Diagnosing latency issues
诊断延迟问题
Finding the causes of slow responses
寻找响应慢的原因。
This document will help you understand what the problem could be if you are experiencing latency problems with Redis.
这个文档将帮你理解Redis延迟问题的原因。
In this context latency is the maximum delay between the time a client issues a command and the time the reply to the command is received by the client. Usually Redis processing time is extremely low, in the sub microsecond range, but there are certain conditions leading to higher latency figures.
这篇文章里的*延迟*指的是客户端发送命令到客户端接收返回之间的最大延迟。通常Redis处理特别快,在微秒级别,但是有些条件会导致高延迟。
I’ve little time, give me the checklist
紧急情况检查清单
The following documentation is very important in order to run Redis in a low latency fashion. However I understand that we are busy people, so let’s start with a quick checklist. If you fail following these steps, please return here to read the full documentation.
下面的文档对于低延迟运行redis很重要,但是,我们都是大忙人,所以咱先来一个快速检查清单。如果你按以下步骤没成功,再来读整篇文档
- Make sure you are not running slow commands that are blocking the server. Use the Redis Slow Log feature to check this.
确保你没有运行阻塞服务的慢命令。使用SlowLog功能可以检查。
- For EC2 users, make sure you use HVM based modern EC2 instances, like m3.medium. Otherwise fork() is too slow.
对于EC2(Elastic Compute Cloud 亚马逊弹性计算云)用户,确保你使用基于现代EC2实例的HVM,如 m3.medium。否则fork()操作太慢了。
- Transparent huge pages must be disabled from your kernel. Use
echo never > /sys/kernel/mm/transparent_hugepage/enabled
to disable them, and restart your Redis process.
THP必须被禁用,使用
echo never > /sys/kernel/mm/transparent_hugepage/enabled
禁用然后重启Redis进程。
- If you are using a virtual machine, it is possible that you have an intrinsic latency that has nothing to do with Redis. Check the minimum latency you can expect from your runtime environment using
./redis-cli --intrinsic-latency 100
. Note: you need to run this command in the server not in the client.
如果用虚拟机,内部的延迟Redis没法处理。你可以使用
./redis-cli --intrinsic-latency 100
命令从运行环境获取最低延迟信息。注意:你需要在服务器端运行这个命令,而不是客户端。
- Enable and use the Latency monitor feature of Redis in order to get a human readable description of the latency events and causes in your Redis instance. > 启用Latency monitor功能来获取导致你Redis实例延迟的事件的描述。
In general, use the following table for durability VS latency/performance tradeoffs, ordered from stronger safety to better latency.
基本上,使用以下表格持久化延迟性能权衡策略,按安全性到低延迟进行排序
- AOF + fsync always: this is very slow, you should use it only if you know what you are doing. > AOF + fsync always: 非常慢
- AOF + fsync every second: this is a good compromise. > AOF + fsync every second 比较折中
- AOF + fsync every second + no-appendfsync-on-rewrite option set to yes: this is as the above, but avoids to fsync during rewrites to lower the disk pressure. > AOF + fsync every second + no-appendfsync-on-rewrite option set to yes: 与上面相同,但避免了重写时进行fsync,以降低磁盘压力。
- AOF + fsync never. Fsyncing is up to the kernel in this setup, even less disk pressure and risk of latency spikes. > AOF + fsync never:更低的磁盘负载和延迟毛刺风险。
- RDB. Here you have a vast spectrum of tradeoffs depending on the save triggers you configure. > RDB.基于配置的触发器性能差距很大
And now for people with 15 minutes to spend, the details… > 接着说明要花15分钟来看详细说明。
Measuring latency
测量延迟
If you are experiencing latency problems, you probably know how to measure it in the context of your application, or maybe your latency problem is very evident even macroscopically. However redis-cli can be used to measure the latency of a Redis server in milliseconds, just try:
如果你遇到延迟问题,你也许知道如何在你的应用上下文中测量延迟,或者你的延迟问题非常明显甚至肉眼可见。
redis-cli
可以被用来测量 Redis 服务的毫秒单位下的延迟。使用如下命令:
redis-cli --latency -h `host` -p `port`
Using the internal Redis latency monitoring subsystem
使用内置的Redis延迟监控子系统
Since Redis 2.8.13, Redis provides latency monitoring capabilities that are able to sample different execution paths to understand where the server is blocking. This makes debugging of the problems illustrated in this documentation much simpler, so we suggest enabling latency monitoring ASAP. Please refer to the Latency monitor documentation.
Redis2.8.13以后,Redis提供了延迟监控单元,来对不同的执行路径进行抽样,来标记服务哪里阻塞。这使得调试本文中的问题更简单,所以我们建议尽快启动延迟监控。参考文档 Latency monitor documentation.
While the latency monitoring sampling and reporting capabilities will make it simpler to understand the source of latency in your Redis system, it is still advised that you read this documentation extensively to better understand the topic of Redis and latency spikes.
延迟监控采样和报告能力可以帮助我们理解自己Redis系统延迟的根源,在这里仍然建议您广泛阅读本文档,以更好地了解 Redis 和延迟峰值的主题。
Latency baseline
延迟基准线
There is a kind of latency that is inherently part of the environment where you run Redis, that is the latency provided by your operating system kernel and, if you are using virtualization, by the hypervisor you are using.
有一种延迟是Redis运行环境内生的延迟,是由你的操作系统内核产生的,并且如果你使用虚拟化,会由虚拟机管理程序产生。
While this latency can’t be removed it is important to study it because it is the baseline, or in other words, you won’t be able to achieve a Redis latency that is better than the latency that every process running in your environment will experience because of the kernel or hypervisor implementation or setup.
虽然无法消除此延迟,但研究它很重要,因为它是基线,换句话说,环境中运行的每个进程由于内核或虚拟机管理程序实现或设置产生的Redis延迟就已经到底了,你再怎么优化也不可能优化的比这个延迟更低。
We call this kind of latency intrinsic latency, and redis-cli
starting from Redis version 2.8.7 is able to measure it. This is an example run under Linux 3.11.0 running on an entry level server.
我们称这种延迟为内源性延迟,在Redis 2.8.7版本开始
redis-cli
工具可以测试它。下面以Linux 3.11.0版本的入门级服务器上运行为例。
Note: the argument 100
is the number of seconds the test will be executed. The more time we run the test, the more likely we’ll be able to spot latency spikes. 100 seconds is usually appropriate, however you may want to perform a few runs at different times. Please note that the test is CPU intensive and will likely saturate a single core in your system.
注意:参数100是测试执行将被执行的秒数。我们运行测试的时间越长,我们越能定位延迟峰值。100秒一般就很合适,但你可能希望不同的时间执行一些次测试运行。要注意这个测试会占用大量CPU资源,而且可能会使系统中的单个核饱和。
$ ./redis-cli --intrinsic-latency 100
Max latency so far: 1 microseconds.
Max latency so far: 16 microseconds.
Max latency so far: 50 microseconds.
Max latency so far: 53 microseconds.
Max latency so far: 83 microseconds.
Max latency so far: 115 microseconds.
Note: redis-cli in this special case needs to run in the server where you run or plan to run Redis, not in the client. In this special mode redis-cli does not connect to a Redis server at all: it will just try to measure the largest time the kernel does not provide CPU time to run to the redis-cli process itself.
注意,
redis-cli
在这个特殊场景中需要在你运行或计划运行Redis实例的服务器上运行,而不是在其他客户端服务器上运行。在这种特殊模式下,redis-cli
工具压根不会连接Redis服务:它只会尝试测量 内核从不提供CPU时间运行 到 切换到Redis-cli 进程本身的最长时间。
In the above example, the intrinsic latency of the system is just 0.115 milliseconds (or 115 microseconds), which is a good news, however keep in mind that the intrinsic latency may change over time depending on the load of the system.
在上面的例子中,系统内生延迟只有0.115毫秒(或115微秒),很好。然而要记住内生延迟会随着系统负载的变化而改变。
Virtualized environments will not show so good numbers, especially with high load or if there are noisy neighbors. The following is a run on a Linode 4096 instance running Redis and Apache:
虚拟化环境的数值不会这么好,特别是在高负载下,或者是同宿主的其他虚拟机负载高的情况。以下是Linode 4096实例上执行运行 Redis 和 Apache的测试:
$ ./redis-cli --intrinsic-latency 100
Max latency so far: 573 microseconds.
Max latency so far: 695 microseconds.
Max latency so far: 919 microseconds.
Max latency so far: 1606 microseconds.
Max latency so far: 3191 microseconds.
Max latency so far: 9243 microseconds.
Max latency so far: 9671 microseconds.
Here we have an intrinsic latency of 9.7 milliseconds: this means that we can’t ask better than that to Redis. However other runs at different times in different virtualization environments with higher load or with noisy neighbors can easily show even worse values. We were able to measure up to 40 milliseconds in systems otherwise apparently running normally.
可以看到内生 延迟达到了9.7毫秒,这意味着我们请求Redis不会比这个更快了。然而其他的在不同时间在不同的高负载虚拟化环境或高负载集群的场景下,内生延迟的值会更加糟糕。我们能够在显然正常运行的系统中测量高达40毫秒的差距。
Latency induced by network and communication
网络和通信产生的延迟
Clients connect to Redis using a TCP/IP connection or a Unix domain connection. The typical latency of a 1 Gbit/s network is about 200 us, while the latency with a Unix domain socket can be as low as 30 us. It actually depends on your network and system hardware. On top of the communication itself, the system adds some more latency (due to thread scheduling, CPU caches, NUMA placement, etc …). System induced latencies are significantly higher on a virtualized environment than on a physical machine.
客户端使用TCP/IP连接或Unix域连接来连接到Redis服务。千兆网络的常规延迟大概200微秒,使用Unix域套接字可以达到低到30微秒。延迟取决于你的网络和系统硬件。除了通信本身之外,系统还增加了一些延迟(由于线程调度,CPU缓存,NUMA放置等原因)。虚拟化环境中的系统引起的延迟明显高于物理机。
The consequence is even if Redis processes most commands in sub microsecond range, a client performing many roundtrips to the server will have to pay for these network and system related latencies.
结果就是,即使Redis大部分命令处理时间在亚毫秒级别范围内,客户端到服务器端执行往返也会因为网络和系统的消耗而造成延迟。
An efficient client will therefore try to limit the number of roundtrips by pipelining several commands together. This is fully supported by the servers and most clients. Aggregated commands like MSET/MGET can be also used for that purpose. Starting with Redis 2.4, a number of commands also support variadic parameters for all data types.
一个高效的客户端会因为使用管道指执行一些命令来限制到服务端的往返次数。这在服务端和多数客户端都是支持的。聚合命令如 MSET/MGET也可以用来达到这个目的。从Redis2.4开始,许多命令还支持所有数据类型的可变参数(即一次可以传多个参数)。
Here are some guidelines:
这有一些指导方针:
- If you can afford it, prefer a physical machine over a VM to host the server. > 有能力上物理机,别用虚拟机。
- Do not systematically connect/disconnect to the server (especially true for web based applications). Keep your connections as long lived as possible. > 不要系统地连接/断开与服务器的连接(特别是对于基于Web的应用程序)。尽可能使用长连接。
- If your client is on the same host than the server, use Unix domain sockets. > 如果您的客户端与服务器位于同一主机上,请使用 Unix 域套接字
- Prefer to use aggregated commands (MSET/MGET), or commands with variadic parameters (if possible) over pipelining. > 首选使用聚合命令 (MSET/MGET)或带有可变参数的命令(如果可能)而不是流水线。
- Prefer to use pipelining (if possible) over sequence of roundtrips. > 首选使用流水线(如果可能)而不是往返顺序。
- Redis supports Lua server-side scripting to cover cases that are not suitable for raw pipelining (for instance when the result of a command is an input for the following commands). > Redis 支持 Lua 服务器端脚本,以涵盖不适合原始管道的情况(例如,当命令的结果是后续命令的输入参数时)。
On Linux, some people can achieve better latencies by playing with process placement (taskset), cgroups, real-time priorities (chrt), NUMA configuration (numactl), or by using a low-latency kernel. Please note vanilla Redis is not really suitable to be bound on a single CPU core. Redis can fork background tasks that can be extremely CPU consuming like BGSAVE
or BGREWRITEAOF
. These tasks must never run on the same core as the main event loop.
在 Linux 上,有些人可以通过使用进程放置(任务集)、cgroups、实时优先级 (chrt)、NUMA 配置(numactl)或使用低延迟内核来实现更好的延迟。请注意,普通的 Redis 并不适合绑定在 单个 CPU 内核上。Redis 可以fork后台任务,这些任务可能非常消耗 CPU,例如 [
BGSAVE
](https://redis.io/commands/bgsave)或 [BGREWRITEAOF
](https://redis.io/commands/bgrewriteaof) 。这些任务决不能在与主事件循环相同的内核上运行。
In most situations, these kind of system level optimizations are not needed. Only do them if you require them, and if you are familiar with them.
在大多数场景 ,这些系统级别优化并不必要。只有确实需要并且了解它们的情况下才能做。
Single threaded nature of Redis
Redis单线程的本质
Redis uses a mostly single threaded design. This means that a single process serves all the client requests, using a technique called multiplexing. This means that Redis can serve a single request in every given moment, so all the requests are served sequentially. This is very similar to how Node.js works as well. However, both products are not often perceived as being slow. This is caused in part by the small amount of time to complete a single request, but primarily because these products are designed to not block on system calls, such as reading data from or writing data to a socket.
Redis 使用 大部分 单线程设计。这意味着单个进程使用一种称为多路复用的技术为所有客户端请求提供服务。这意味着 Redis 可以在每个给定时刻为单个请求提供服务,因此所有请求都是按顺序处理的。这与 Node.js的工作方式非常相似。然而,这两种产品通常都不被认为是缓慢的。这部分是由于完成单个请求的时间很短,但主要是因为这些产品被设计为不阻塞系统调用,例如从套接字读取数据或将数据写入套接字。
I said that Redis is mostly single threaded since actually from Redis 2.4 we use threads in Redis in order to perform some slow I/O operations in the background, mainly related to disk I/O, but this does not change the fact that Redis serves all the requests using a single thread.
我说过 Redis*大部分*是单线程的,因为实际上从Redis 2.4开始,我们在Redis中使用线程,以便在后台执行一些缓慢的I/O操作,主要与磁盘I/O有关,但这并不能改变Redis使用单个线程为所有请求提供服务的事实。
Latency generated by slow commands
由慢命令产生的延迟
A consequence of being single thread is that when a request is slow to serve all the other clients will wait for this request to be served. When executing normal commands, like GET
or SET
or LPUSH
this is not a problem at all since these commands are executed in constant (and very small) time. However there are commands operating on many elements, like SORT
, LREM
, SUNION
and others. For instance taking the intersection of two big sets can take a considerable amount of time.
单线程的结果是,当一个请求处理速度慢了,所有其他客户端将等待此请求处理完成。当执行常规命令时,如
GET
或SET
或LPUSH
,这根本不是问题,因为这些命令是在恒定(和非常小)的时间内执行的。但是,有许多命令对许多元素进行操作,例如SORT
,LREM
,SUNION
等。例如,采用两个大集合的交集可能需要相当长的时间。
The algorithmic complexity of all commands is documented. A good practice is to systematically check it when using commands you are not familiar with.
所有的算法复杂度都有记录。一个好的实践是,当你使用一个不熟悉的命令时,先系统地检查一下它。
If you have latency concerns you should either not use slow commands against values composed of many elements, or you should run a replica using Redis replication where you run all your slow queries.
如果您有延迟问题,那你就不应对由许多元素组成的值使用慢速命令,或者你应该在Redis副本上执行慢查询。
It is possible to monitor slow commands using the Redis Slow Log feature.
使用 Slow Log工具可以监控慢命令。
Additionally, you can use your favorite per-process monitoring program (top, htop, prstat, etc …) to quickly check the CPU consumption of the main Redis process. If it is high while the traffic is not, it is usually a sign that slow commands are used.
此外,您可以使用自己喜欢的进程监控程序(
top
,htop
,prstat
等)来快速检查主Redis进程的CPU消耗。如果它很高而流量不高,则通常表示使用了慢速命令。
IMPORTANT NOTE: a VERY common source of latency generated by the execution of slow commands is the use of the KEYS
command in production environments. KEYS
, as documented in the Redis documentation, should only be used for debugging purposes. Since Redis 2.8 a new commands were introduced in order to iterate the key space and other large collections incrementally, please check the SCAN
, SSCAN
, HSCAN
and ZSCAN
commands for more information.
重点注意:一个非常常见的因慢命令执行而产生的延迟是在生产环境的使用
KEYS
命令。KEYS
命令如同在Redis文档中记录的一样,应该只被用于调试目的。Redis2.8后新引入增量迭代key和大集群的命令,具体查看SCAN
,SSCAN
,HSCAN
andZSCAN
命令的信息。
Latency generated by fork
因
fork
产生的延迟
In order to generate the RDB file in background, or to rewrite the Append Only File if AOF persistence is enabled, Redis has to fork background processes. The fork operation (running in the main thread) can induce latency by itself.
为了在后台生成RDB文件,或者重写AOF文件,Redis会fork后台进程。在主纯种中进行fork操作会产生延迟。
Forking is an expensive operation on most Unix-like systems, since it involves copying a good number of objects linked to the process. This is especially true for the page table associated to the virtual memory mechanism
fork操作在多数Unix系统中是很重的操作,因为会复制主进程中的大量对象。在虚拟机的页表中尤其明显。
For instance on a Linux/AMD64 system, the memory is divided in 4 kB pages. To convert virtual addresses to physical addresses, each process stores a page table (actually represented as a tree) containing at least a pointer per page of the address space of the process. So a large 24 GB Redis instance requires a page table of 24 GB / 4 kB * 8 = 48 MB.
例如在 Linux或AMD64系统,内存以4kb的页进行分隔。为了转换虚拟地址到物理地址,每个进程会存储一个页表(实际以树形式展现)包含至少一个进程地址空间的指针。所以一个24G的Redis实例,需要一个48M的页表。
When a background save is performed, this instance will have to be forked, which will involve allocating and copying 48 MB of memory. It takes time and CPU, especially on virtual machines where allocation and initialization of a large memory chunk can be expensive.
当一个后台保存执行时,这个实例将被form,同时将分配和复制48M内存。这将会消耗时间和CPU,特别在虚拟机上,分配和初始化一个大内存块消耗很大。
Fork time in different systems
不同系统的Fork时间
Modern hardware is pretty fast at copying the page table, but Xen is not. The problem with Xen is not virtualization-specific, but Xen-specific. For instance using VMware or Virtual Box does not result into slow fork time. The following is a table that compares fork time for different Redis instance size. Data is obtained performing a BGSAVE and looking at the latest_fork_usec
filed in the INFO
command output.
现代硬件复制页表很快,但是Xen虚拟机不是。XEN虚拟机的问题不是虚拟化特性,而是Xen特性。例如使用VMware或者VirtualBox不会减慢fork时间。以下是不同实例数下fork时间对比的表格。数据是运行bgsave后观察
INFO
命令输出的latest_fork_usec
字段取得的。
However the good news is that new types of EC2 HVM based instances are much better with fork times, almost on par with physical servers, so for example using m3.medium (or better) instances will provide good results.
好消息是EC2 HVM新类型的实例fork时间很好,和物理机差不多了,所以使用如m3.medium(或更好的)实例可以提供更好的效果。
- Linux beefy VM on VMware 6.0GB RSS forked in 77 milliseconds (12.8 milliseconds per GB).
- Linux running on physical machine (Unknown HW) 6.1GB RSS forked in 80 milliseconds (13.1 milliseconds per GB)
- Linux running on physical machine (Xeon @ 2.27Ghz) 6.9GB RSS forked into 62 milliseconds (9 milliseconds per GB).
- Linux VM on 6sync (KVM) 360 MB RSS forked in 8.2 milliseconds (23.3 milliseconds per GB).
- Linux VM on EC2, old instance types (Xen) 6.1GB RSS forked in 1460 milliseconds (239.3 milliseconds per GB).
- Linux VM on EC2, new instance types (Xen) 1GB RSS forked in 10 milliseconds (10 milliseconds per GB).
- Linux VM on Linode (Xen) 0.9GBRSS forked into 382 milliseconds (424 milliseconds per GB).
类型 | Rss | 时间(毫秒)| 每GB平均值 —— | —- | – | – Linux beefy VM on VMware | 6GB RSS | 77 | 12.8 Linux running on physical machine (Unknown HW) | 6.1GB | 80 | 13.1 Linux running on physical machine (Xeon @ 2.27Ghz) | 6.9GB RSS | 62 | 9 Linux VM on 6sync (KVM) | 360MB | 8.2 | 23.3 Linux VM on EC2, old instance types (Xen) | 6.1GB | 1460 | 239.3 Linux VM on EC2, new instance types (Xen) | 1GB | 10 | 10 Linux VM on Linode (Xen) | 0.9GB | 382 | 424
As you can see certain VMs running on Xen have a performance hit that is between one order to two orders of magnitude. For EC2 users the suggestion is simple: use modern HVM based instances.
如您所见,在 Xen 上运行的某些 VM 的性能下降幅度在一个数量级到两个数量级之间。对于 EC2 用户,建议很简单:使用基于 HVM 的现代实例。
Latency induced by transparent huge pages
由
transparent huge pages
透明大页导致的延迟
Unfortunately when a Linux kernel has transparent huge pages enabled, Redis incurs to a big latency penalty after the fork
call is used in order to persist on disk. Huge pages are the cause of the following issue:
当Linux内容开启了透明大页(transparent huge pages, THP),Redis在为了在硬盘持久化而执行
fork
后会引起很大的延迟损失。大页面是以下问题的原因:
- Fork is called, two processes with shared huge pages are created.
> 调用
Fork
,创建两个共享大页面的进程。 - In a busy instance, a few event loops runs will cause commands to target a few thousand of pages, causing the copy on write of almost the whole process memory. > 在繁忙的实例中,运行几个事件循环将导致命令以几千页为目标,从而导致在写入时复制几乎涉及到整个进程内存。
- This will result in big latency and big memory usage. > 这会导致巨大延迟和巨大的内存占用。
Make sure to disable transparent huge pages using the following command:
确保要禁用透明大页,使用如下命令
echo never > /sys/kernel/mm/transparent_hugepage/enabled
Latency induced by swapping (operating system paging)
由 swapping(操作系统分页)产生的延迟
Linux (and many other modern operating systems) is able to relocate memory pages from the memory to the disk, and vice versa, in order to use the system memory efficiently.
Linux(和许多其他现代操作系统)能够将内存页从内存重新定位到磁盘,反之亦然,以便有效地使用系统内存。
If a Redis page is moved by the kernel from the memory to the swap file, when the data stored in this memory page is used by Redis (for example accessing a key stored into this memory page) the kernel will stop the Redis process in order to move the page back into the main memory. This is a slow operation involving random I/Os (compared to accessing a page that is already in memory) and will result into anomalous latency experienced by Redis clients.
如果内核将 Redis 页从内存移动到交换文件,则当 Redis 使用存储在此内存页中的数据(例如访问存储在此内存页中的密钥)时,内核将停止 Redis 进程,以便将该页移回主内存。这是一个涉及随机 I/O 的慢操作(与访问已在内存中的页面相比),并将导致 Redis 客户端遇到异常延迟。
The kernel relocates Redis memory pages on disk mainly because of three reasons:
内核重置Redis内在页到硬盘主要有以下三个原因:
The system is under memory pressure since the running processes are demanding more physical memory than the amount that is available. The simplest instance of this problem is simply Redis using more memory than is available. > 系统面临内存压力,因为正在运行的进程需要的物理内存比可用内存量多。此问题的最简单例子是 Redis 使用比可用内存更多的内存。
The Redis instance data set, or part of the data set, is mostly completely idle (never accessed by clients), so the kernel could swap idle memory pages on disk. This problem is very rare since even a moderately slow instance will touch all the memory pages often, forcing the kernel to retain all the pages in memory. > Redis 实例数据集或数据集的一部分,几乎是完全空闲的(客户端从不访问),因此内核可能会交换空闲内存页到磁盘。这个问题非常罕见,因为即使是中等速度较慢的实例也会经常接触所有内存页,迫使内核保留内存中的所有页。
Some processes are generating massive read or write I/Os on the system. Because files are generally cached, it tends to put pressure on the kernel to increase the filesystem cache, and therefore generate swapping activity. Please note it includes Redis RDB and/or AOF background threads which can produce large files. > 某些进程正在系统上生成大量读取或写入 I/O。由于文件通常是缓存的,因此它往往会给内核带来压力,以增加文件系统缓存,从而生成交换活动。请注意,它包括可以生成大文件的 Redis RDB 和/或 AOF 后台线程。
Fortunately Linux offers good tools to investigate the problem, so the simplest thing to do is when latency due to swapping is suspected is just to check if this is the case.
幸运的是,Linux提供了很好的工具来调查这个问题,所以最简单的办法是,当怀疑由于交换而导致的延迟时,只需检查是否属于这种情况。
The first thing to do is to checking the amount of Redis memory that is swapped on disk. In order to do so you need to obtain the Redis instance pid:
首先要做的是检查磁盘上交换的 Redis 内存量。为此,您需要获取 Redis 实例 pid:
$ redis-cli info | grep process_id
process_id:5454
Now enter the /proc file system directory for this process:
然后进入这个进程的文件系统目录 /proc
$ cd /proc/5454
Here you’ll find a file called smaps that describes the memory layout of the Redis process (assuming you are using Linux 2.6.16 or newer). This file contains very detailed information about our process memory maps, and one field called Swap is exactly what we are looking for. However there is not just a single swap field since the smaps file contains the different memory maps of our Redis process (The memory layout of a process is more complex than a simple linear array of pages).
在这里能找到一个名为smaps的文件,该文件描述了Redis进程的内存布局(假设您使用的是Linux 2.6.16或更高版本)。此文件包含有关进程内存映射的非常详细的信息,我们要找的是一个名为 Swap 的字段。但是,由于smaps文件包含Redis进程的不同内存映射(进程的内存布局比简单的线性页面数组更复杂),因此不仅仅是一个交换字段。
Since we are interested in all the memory swapped by our process the first thing to do is to grep for the Swap field across all the file:
因为我们要看所有的进程交换的内在,所以要从文件中grep出 Swap字段:
$ cat smaps | grep 'Swap:'
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 12 kB
Swap: 156 kB
Swap: 8 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 4 kB
Swap: 0 kB
Swap: 0 kB
Swap: 4 kB
Swap: 0 kB
Swap: 0 kB
Swap: 4 kB
Swap: 4 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
If everything is 0 kB, or if there are sporadic 4k entries, everything is perfectly normal. Actually in our example instance (the one of a real web site running Redis and serving hundreds of users every second) there are a few entries that show more swapped pages. To investigate if this is a serious problem or not we change our command in order to also print the size of the memory map:
如果全都是0kB,或者偶尔有4k的条目,说明没问题。实际上在我们的例子实例(真正并发几百用户的网站运行的Redis)有很多条目会展示更多交换页面。检查有没有大问题,我们需要修改命令来打印内存映射:
$ cat smaps | egrep '^(Swap|Size)'
Size: 316 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 8 kB
Swap: 0 kB
Size: 40 kB
Swap: 0 kB
Size: 132 kB
Swap: 0 kB
Size: 720896 kB
Swap: 12 kB
Size: 4096 kB
Swap: 156 kB
Size: 4096 kB
Swap: 8 kB
Size: 4096 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 1272 kB
Swap: 0 kB
Size: 8 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 16 kB
Swap: 0 kB
Size: 84 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 8 kB
Swap: 4 kB
Size: 8 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 4 kB
Swap: 4 kB
Size: 144 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 4 kB
Swap: 4 kB
Size: 12 kB
Swap: 4 kB
Size: 108 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
Size: 272 kB
Swap: 0 kB
Size: 4 kB
Swap: 0 kB
As you can see from the output, there is a map of 720896 kB (with just 12 kB swapped) and 156 kB more swapped in another map: basically a very small amount of our memory is swapped so this is not going to create any problem at all.
从上面的输出可以看到,有一条映射是 720896kB(只有12kB的交换)和 另外一条有超过156 kB交换的映射:基本上一个很少量的内在交换没啥大问题。
If instead a non trivial amount of the process memory is swapped on disk your latency problems are likely related to swapping. If this is the case with your Redis instance you can further verify it using the vmstat command:
相反,如果在磁盘上交换了大量进程内存,则延迟问题可能与交换有关。如果您的 Redis 实例属于这种情况,您可以使用 vmstat 命令进一步验证它:
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 3980 697932 147180 1406456 0 0 2 2 2 0 4 4 91 0
0 0 3980 697428 147180 1406580 0 0 0 0 19088 16104 9 6 84 0
0 0 3980 697296 147180 1406616 0 0 0 28 18936 16193 7 6 87 0
0 0 3980 697048 147180 1406640 0 0 0 0 18613 15987 6 6 88 0
2 0 3980 696924 147180 1406656 0 0 0 0 18744 16299 6 5 88 0
0 0 3980 697048 147180 1406688 0 0 0 4 18520 15974 6 6 88 0
^C
The interesting part of the output for our needs are the two columns si and so, that counts the amount of memory swapped from/to the swap file. If you see non zero counts in those two columns then there is swapping activity in your system.
我们需要的部分是 si和so两个字段,这记录了内存从/向交换文件交换的内存的量。如果你在这两个字段中看到非0的记录,说明在你的系统中有交换活动。
Finally, the iostat command can be used to check the global I/O activity of the system.
最后,isstat 命令可以且来检查系统全局的 I/O活动。
$ iostat -xk 1
avg-cpu: %user %nice %system %iowait %steal %idle
13.55 0.04 2.92 0.53 0.00 82.95
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.77 0.00 0.01 0.00 0.40 0.00 73.65 0.00 3.62 2.58 0.00
sdb 1.27 4.75 0.82 3.54 38.00 32.32 32.19 0.11 24.80 4.24 1.85
If your latency problem is due to Redis memory being swapped on disk you need to lower the memory pressure in your system, either adding more RAM if Redis is using more memory than the available, or avoiding running other memory hungry processes in the same system.
如果延迟问题是由于在磁盘上交换 Redis 内存而导致的,则需要降低系统中的内存压力,如果 Redis 使用的内存多于可用内存,则添加更多 RAM,或者避免在同一系统中运行其他内存密集型进程。
Latency due to AOF and disk I/O
由AOF和disk I/O导致的延迟
Another source of latency is due to the Append Only File support on Redis. The AOF basically uses two system calls to accomplish its work. One is write(2) that is used in order to write data to the append only file, and the other one is fdatasync(2) that is used in order to flush the kernel file buffer on disk in order to ensure the durability level specified by the user.
另一个延迟来源是由于 Redis的AOF支持。AOF基本使用两个系统调用来实现。一个是write(2) ,用于将数据写到 aof文件中,另一个命令是 fdatasync(2),用于刷新磁盘上的内核文件缓冲区,确保用户指定的持久性级别。
Both the write(2) and fdatasync(2) calls can be source of latency. For instance write(2) can block both when there is a system wide sync in progress, or when the output buffers are full and the kernel requires to flush on disk in order to accept new writes.
write(2) 和 fdatasync(2) 两个调用都有可能成为延迟的来源。例如write(2)可能在进程有系统范围的同步时阻塞,也可能在输出缓冲区满了并且内核为了接收新的写入需要刷新硬盘上的输出缓冲区时造成阻塞。
The fdatasync(2) call is a worse source of latency as with many combinations of kernels and file systems used it can take from a few milliseconds to a few seconds to complete, especially in the case of some other process doing I/O. For this reason when possible Redis does the fdatasync(2) call in a different thread since Redis 2.4.
fdatasync(2)调用是更糟的延迟来源,因为内核和文件系统有很多关联,调用 fdatasync(2)会花费几毫秒到几秒来完成,特别是在其他进程进行I/O的情况下。因此,Redis2.4以后会在其他线程中调用 fdatasync(2) 方法。
We’ll see how configuration can affect the amount and source of latency when using the AOF file.
我们看看使用AOF文件配置文件如何对延迟的来源和值造成影响。
The AOF can be configured to perform a fsync on disk in three different ways using the appendfsync configuration option (this setting can be modified at runtime using the CONFIG SET command).
AOF配置刷盘有三种 appendfsync 配置选项。(可以通过CONFIG SET 命令在运行时动态设置)
When appendfsync is set to the value of no Redis performs no fsync. In this configuration the only source of latency can be write(2). When this happens usually there is no solution since simply the disk can’t cope with the speed at which Redis is receiving data, however this is uncommon if the disk is not seriously slowed down by other processes doing I/O. > 当 appendfsync 值设置为no Redis 不会执行fsync。这个配置下延迟的唯一来源是write(2)。当发生时,通常是没法解决的,因为磁盘没法承受Redis接收数据的速度,但是如果磁盘没有被执行I / O的其他进程严重减慢,则这种情况并不常见。
When appendfsync is set to the value of everysec Redis performs a fsync every second. It uses a different thread, and if the fsync is still in progress Redis uses a buffer to delay the write(2) call up to two seconds (since write would block on Linux if a fsync is in progress against the same file). However if the fsync is taking too long Redis will eventually perform the write(2) call even if the fsync is still in progress, and this can be a source of latency. > 当appendfsync追加同步值设置为 everysec每秒时,Redis会每秒执行一次 fsync同步。它会使用一个不同的线程,并且如果fsync正在进行中,Redis会使用一个缓冲区来将 write(2)调用延迟2秒(因为Linux上write操作会被阻塞,如果一个fsync同步正在对相同文件进行操作)。然而如果fsync同步消耗太长时间,Redis仍会执行write(2)调用即使fsync仍在进行中,这可能是延迟来源之一。
When appendfsync is set to the value of always a fsync is performed at every write operation before replying back to the client with an OK code (actually Redis will try to cluster many commands executed at the same time into a single fsync). In this mode performances are very low in general and it is strongly recommended to use a fast disk and a file system implementation that can perform the fsync in short time. > 当appendfsync追加同步值设置为always,每个write操作在给客户端返回OK编码之前都会执行一次fsync同步(实际上 Redis将尝试聚集同时执行的多个命令集中到一次fsync同步中)。在这种模式下,性能通常会非常低,并且非常推荐使用一个高速磁盘和文件系统实现来在短时间内执行fsync同步。
Most Redis users will use either the no or everysec setting for the appendfsync configuration directive. The suggestion for minimum latency is to avoid other processes doing I/O in the same system. Using an SSD disk can help as well, but usually even non SSD disks perform well with the append only file if the disk is spare as Redis writes to the append only file without performing any seek.
多数 Redis用户会对appendfsync追加同步配置使用 no 或者 everysec设置。最小延迟的建议是避免同一系统中其他进程在执行 I/O 操作。使用SSD硬盘也有用,但通常如果硬盘是备用盘,并且Redis仅写入追加文件不执行任何搜索的情况下,非SSD硬盘的表现也很好,
If you want to investigate your latency issues related to the append only file you can use the strace command under Linux:
如果你想要排查仅追加文件相关的延迟问题,你可以使用Linux系统的追踪命令:
sudo strace -p $(pidof redis-server) -T -e trace=fdatasync
The above command will show all the fdatasync(2) system calls performed by Redis in the main thread. With the above command you’ll not see the fdatasync system calls performed by the background thread when the appendfsync config option is set to everysec. In order to do so just add the -f switch to strace.
以上命令将会展示所有Redis主线程执行的的fdatasync(2)系统调用。使用以上命令无法查探到追加配置为everysec时后台线程执行的 fdatasync系统调用的记录。如果想查看到,需要加 -f 参数切换到相应追踪。
If you wish you can also see both fdatasync and write system calls with the following command:
如果你希望同时查看到 fdatasync和write系统调用,可以使用如下命令:
sudo strace -p $(pidof redis-server) -T -e trace=fdatasync,write
However since write(2) is also used in order to write data to the client sockets this will likely show too many things unrelated to disk I/O. Apparently there is no way to tell strace to just show slow system calls so I use the following command:
由于 write(2)也会被用于写数据到客户端套接字,执行这个命令可能会展示过多与硬盘I/O无关的记录。可以用如下命令只展示慢的系统调用 。
sudo strace -f -p $(pidof redis-server) -T -e trace=fdatasync,write 2>&1 | grep -v '0.0' | grep -v unfinished
Latency generated by expires
因失效产生的延迟
Redis evict expired keys in two ways:
Redis以以下两种方式淘汰过期key:
- One lazy way expires a key when it is requested by a command, but it is found to be already expired. > 一个*懒惰*的方式,当一个key被命令请求到但发现这个key已经过期的时候才淘汰key。
- One active way expires a few keys every 100 milliseconds. > 一个*主动*方式每100毫秒淘汰一些key。
The active expiring is designed to be adaptive. An expire cycle is started every 100 milliseconds (10 times per second), and will do the following:
主动过期旨在具有自适应性。每 100 毫秒(每秒 10 次)启动一个过期周期,并将执行以下操作:
- Sample
ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP
keys, evicting all the keys already expired. > 采样ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP
个数量的key,淘汰所有的已经过期的key。 - If the more than 25% of the keys were found expired, repeat. > 如果超过25%的key被发现过期了,重复。
Given that ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP
is set to 20 by default, and the process is performed ten times per second, usually just 200 keys per second are actively expired. This is enough to clean the DB fast enough even when already expired keys are not accessed for a long time, so that the lazy algorithm does not help. At the same time expiring just 200 keys per second has no effects in the latency a Redis instance.
ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP
默认设置为20,这个处理每秒钟执行10次,通常每秒只有200个key被主动过期淘汰。这足以足够快地清理数据库,即使已经过期的key长时间未被访问,因此 lazy 算法无济于事。同时,每秒仅过期 200 个key不会造成 Redis 实例的延迟。
However the algorithm is adaptive and will loop if it finds more than 25% of keys already expired in the set of sampled keys. But given that we run the algorithm ten times per second, this means that the unlucky event of more than 25% of the keys in our random sample are expiring at least in the same second.
然而算法是适应性的,如果发现采样范围的key超过25%的key已经过期的话会循环。但是,假设我们每秒运行该算法十次,这意味着随机样本中超过25%的key的不幸事件至少*在同一秒*到期。
Basically this means that if the database has many, many keys expiring in the same second, and these make up at least 25% of the current population of keys with an expire set, Redis can block in order to get the percentage of keys already expired below 25%.
基本上,这意味着如果数据库有许多很多key在同一秒内过期,并且这些过期的key占总数的至少25%,Redis会阻塞将过期key的比例降低至低于25%。
This approach is needed in order to avoid using too much memory for keys that are already expired, and usually is absolutely harmless since it’s strange that a big number of keys are going to expire in the same exact second, but it is not impossible that the user used EXPIREAT
extensively with the same Unix time.
为了避免过期的key使用过多的内存,这个方法很有必要,并且通常是完全无害的,因为大量的key同时在精确的1秒过期很奇怪,虽然使用
EXPIREAT
也不是不能实现。
In short: be aware that many keys expiring at the same moment can be a source of latency.
总结,要注意大量key在同一秒过期可能会是延迟的来源之一。
Redis software watchdog
Redis软件监视者
Redis 2.6 introduces the Redis Software Watchdog that is a debugging tool designed to track those latency problems that for one reason or the other escaped an analysis using normal tools.
Redis 2.6引入了*Redis软件看门狗*,这是一个调试工具,旨在跟踪那些由于某种原因而无法使用普通工具进行分析的延迟问题。
The software watchdog is an experimental feature. While it is designed to be used in production environments care should be taken to backup the database before proceeding as it could possibly have unexpected interactions with the normal execution of the Redis server.
软件看门狗是一个实验特性。虽然它设计用于生产环境,但在继续操作之前,应注意备份数据库,因为它可能会与Redis服务器的正常执行发生意外交互。
It is important to use it only as last resort when there is no way to track the issue by other means.
重要的是,只有在无法通过其他方式跟踪问题时,才能将其用作“最后手段”。
This is how this feature works:
以下是如何使用该特性:
The user enables the software watchdog using the
CONFIG SET
command. > 用户通过CONFIG SET
命令开启软件看门狗功能。Redis starts monitoring itself constantly. > Redis开始持续监控自身。
If Redis detects that the server is blocked into some operation that is not returning fast enough, and that may be the source of the latency issue, a low level report about where the server is blocked is dumped on the log file. > 如果Redis检测到Redis服务被某些返回不够快的操作阻塞,这些可能是延迟的来源,一个关于服务在哪被阻塞的低等级的报告将被输出到日志文件中。
The user contacts the developers writing a message in the Redis Google Group, including the watchdog report in the message. > 用户将带有看门狗报告的信息的消息发送给Redis Google Group。
Note that this feature cannot be enabled using the redis.conf file, because it is designed to be enabled only in already running instances and only for debugging purposes.
要注意这个我并没有在redis.conf配置文件中开启,因为它被设计为仅在已经运行的实例中以调试的目的进行开启。
To enable the feature just use the following:
开启该我只需要使用如下命令:
CONFIG SET watchdog-period 500
The period is specified in milliseconds. In the above example I specified to log latency issues only if the server detects a delay of 500 milliseconds or greater. The minimum configurable period is 200 milliseconds.
以毫秒指定周期。在以上的例子我定义仅当服务器检测到超过500毫秒及以上的延迟才记录延迟问题。这个周期最小为200毫秒。
When you are done with the software watchdog you can turn it off setting the watchdog-period
parameter to 0. Important: remember to do this because keeping the instance with the watchdog turned on for a longer time than needed is generally not a good idea.
软件看门狗可以通过设置
watchdog-period
值为0进行关闭。重点:用完就关闭是好习惯。
The following is an example of what you’ll see printed in the log file once the software watchdog detects a delay longer than the configured one:
以下是软件看门狗检测到延迟时间超过配置的日志输出的一个例子:
[8547 | signal handler] (1333114359)
--- WATCHDOG TIMER EXPIRED ---
/lib/libc.so.6(nanosleep+0x2d) [0x7f16b5c2d39d]
/lib/libpthread.so.0(+0xf8f0) [0x7f16b5f158f0]
/lib/libc.so.6(nanosleep+0x2d) [0x7f16b5c2d39d]
/lib/libc.so.6(usleep+0x34) [0x7f16b5c62844]
./redis-server(debugCommand+0x3e1) [0x43ab41]
./redis-server(call+0x5d) [0x415a9d]
./redis-server(processCommand+0x375) [0x415fc5]
./redis-server(processInputBuffer+0x4f) [0x4203cf]
./redis-server(readQueryFromClient+0xa0) [0x4204e0]
./redis-server(aeProcessEvents+0x128) [0x411b48]
./redis-server(aeMain+0x2b) [0x411dbb]
./redis-server(main+0x2b6) [0x418556]
/lib/libc.so.6(__libc_start_main+0xfd) [0x7f16b5ba1c4d]
./redis-server() [0x411099]
------
Note: in the example the DEBUG SLEEP command was used in order to block the server. The stack trace is different if the server blocks in a different context.
说明:案例中 DEBUG SLEEP 命令用于阻塞服务。其他情况的输出会不一样。
If you happen to collect multiple watchdog stack traces you are encouraged to send everything to the Redis Google Group: the more traces we obtain, the simpler it will be to understand what the problem with your instance is.
如果你收集到多种看门狗栈信息,你最好把所有的信息都发送给Redis Google Group:我去提供的信息越详细,别人越容易理解你的实例出了什么问题。