注册

gen_tcp接受链接时enfile的问题分析及解决

最近我们为了安全方面的原因,在RDS服务器上做了个代理程序把普通的MYSQL TCP连接变成了SSL链接,在**的时候,皓庭同学发现Tsung发起了几千个TCP链接后Erlang做的SSL PROXY老是报告gen_tcp:accept返回{error, enfile}错误。针对这个问题,我展开了如下的调查:

首先man accept手册,确定enfile的原因,因为gen_tcp肯定是调用accept系统调用的:

EMFILE The per-process limit of open file descriptors has been reached.
ENFILE The ** limit on the total number of open files has been reached.

从文档来看是由于系统的文件句柄数用完了,我们顺着来调查下:

$ uname -r
2
2.6.18-164.el5
3
$ cat /proc/sys/fs/file-nr
4
2040 0 2417338
5
$ ulimit -n
6
65535
由于我们微调了系统的文件句柄,具体参考这里 老生常谈: ulimit问题及其影响, 这些参数看起来非常的正常。
先**net/socket.c代码:

static int sock_alloc_fd(struct file **filep)
02
{
03
int fd;
04

05
fd = get_unused_fd();
06
if (likely(fd >= 0)) {
07
struct file *file = get_empty_filp();
08

09
*filep = file;
10
if (unlikely(!file)) {
11
put_unused_fd(fd);
12
return -ENFILE;
13
}
14
} else
15
*filep = NULL;
16
return fd;
17
}
18

19
static int __sock_create(int family, int type, int protocol, struct socket **res, int kern)
20
{
21
...
22
/*
23
* Allocate the socket and allow the family to set things up. if
24
* the protocol is 0, the family is instructed to select an appropriate
25
* default.
26
*/
27

28
if (!(sock = sock_alloc())) {
29
if (net_ratelimit())
30
printk(KERN_WARNING "socket: no more sockets\n");
31
err = -ENFILE; /* Not exactly a match, but its the
32
closest posix thing */
33
goto out;
34
}
35
...
36
}
37

38
asmlinkage long sys_accept(int fd, struct sockaddr __user *upeer_sockaddr, int __user *upeer_addrlen)
39
{
40
struct socket *sock, *newsock;
41
struct file *newfile;
42
int err, len, newfd, fput_needed;
43
char address;
44

45
sock = sockfd_lookup_light(fd, &err, &fput_needed);
46
if (!sock)
47
goto out;
48

49
err = -ENFILE;
50
if (!(newsock = sock_alloc()))
51
goto out_put;
52
...
53
}

从代码来看,会返回ENFILE都是由于socket句柄分配不出来了,我们还是本着怀疑的态度来写个stap脚本来再次验证下:

$ cat enfile.stp
02
probe kernel.function("kmem_cache_alloc").return,
03
kernel.function("get_empty_filp").return{
04
if($return == 0) { print_backtrace();exit();}
05
}
06
probe kernel.function("sock_alloc_fd").return {
07
if($return < 0) { print_backtrace(); exit();}
08
}
09
probe syscall.accept.return {
10
if($return == -23) {print_backtrace(); exit();}
11
}
12
probe begin {
13
println(":~");
14
}
15
$ sudo stap enfile.stp
16
:~
gen_tcp:accept报告{error, enfile}的时候,也没看到stap报异常,基本上可以排除**作系统的原因了,那么我们现在回到gen_tcp的实现来看。
gen_tcp是个port, 具体实现在erts/emulator/drivers/common/inet_drv.c,我们来**有ENFILE的地方:

1
/* Copy a descriptor, by creating a new port with same settings
02
* as the descriptor desc.
03
* return NULL on error (ENFILE no ports avail)
04
*/
05
static tcp_descriptor* tcp_inet_copy(tcp_descriptor* desc,SOCKET s,
06
ErlDrvTermData owner, int* err)
07
{
08
...
09
/* The new port will be linked and connected to the original caller */
10
port = driver_create_port(port, owner, "tcp_inet", (ErlDrvData) copy_desc);
11
if ((long)port == -1) {
12
*err = ENFILE;
13
FREE(copy_desc);
14
return NULL;
15
}
16
...
17
}
当 driver_create_port 失败的时候,gen_tcp返回ENFILE,看起来这次找对地方了。我们继续** driver_create_port的实现:
**erts/emulator/beam/io.c:

/*
02
* Driver function to create new instances of a driver
03
* Historical reason: to be used with inet_drv for creating
04
* accept sockets inorder to avoid a global table.
05
*/
06
ErlDrvPort
07
driver_create_port(ErlDrvPort creator_port_ix, /* Creating port */
08
ErlDrvTermData pid, /* Owner/Caller */
09
char* name, /* Driver name */
10
ErlDrvData drv_data) /* Driver data */
11
{
12
...
13
rp = erts_pid2proc(NULL, 0, pid, ERTS_PROC_LOCK_LINK);
14
if (!rp) {
15
erts_smp_mtx_unlock(&erts_driver_list_lock);
16
return (ErlDrvTermData) -1; /* pid does not exist */
17
}
18
if ((port_num = get_free_port()) < 0) {
19
errno = ENFILE;
20
erts_smp_proc_unlock(rp, ERTS_PROC_LOCK_LINK);
21
erts_smp_mtx_unlock(&erts_driver_list_lock);
22
return (ErlDrvTermData) -1;
23
}
24

25
port_id = make_internal_port(port_num);
26
port = &erts_port;
27
...
28
}
get_free_port()<0的时候就返回ENFILE错误。
那我们**port总的数目是如何设定的:


1
/* initialize the port array */
02
void init_io(void)
03
{
04

05
if (erts_sys_getenv("ERL_MAX_PORTS", maxports, &maxportssize) == 0)
06
erts_max_ports = atoi(maxports);
07
else
08
erts_max_ports = sys_max_files();
09

10
if (erts_max_ports > ERTS_MAX_PORTS)
11
erts_max_ports = ERTS_MAX_PORTS;
12
if (erts_max_ports < 1024)
13
erts_max_ports = 1024;
14

15
if (erts_use_r9_pids_ports) {
16
ports_bits = ERTS_R9_PORTS_BITS;
17
if (erts_max_ports > ERTS_MAX_R9_PORTS)
18
erts_max_ports = ERTS_MAX_R9_PORTS;
19
}
20

21
port_extra_shift = erts_fit_in_bits(erts_max_ports – 1);
22
port_num_mask = (1 << ports_bits) – 1;
23

24
}
第一步:如果设定了ERL_MAX_PORTS环境变量,那么就按照用户设定的,否则就和ulimit -n 一样大。
第二部:这个值不能大于ERTS_MAX_PORTS或者小于1024.

好了,我们基本上明白这个问题的原因了: erts_max_ports设定的太小.

我们再来验证下:
gdb attach到我们的进程下
(gdb) p erts_max_ports
$1 = 4096
原来是port设置有问题,导致上面的现象,看起来很绕的,Erlang的设计者认为PORT资源(相当于**作系统的IO资源)短缺如同**作系统的文件句柄短缺一样,达到**_limit就应该出ENFILE错误!

解决方案是: erl -env ERTS_MAX_PORTS NNNN 搞大点就好。

顺便再来强调下Erlang服务器几个关键的参数,来源:**://**.ejabberd.im/tuning,对服务器的设置很有帮助。

This page lists several tricks to tune your ejabberd and Erlang installation for maximum performance gains. Remark that some of the described options are experimental.

Erlang Ports Limit: ERL_MAX_PORTS
Erlang consumes one port for every connection, either from a ** or from another Jabber **. The option ERL_MAX_PORTS limits the number of concurrent connections and can be specified when starting ejabberd:

erl -s ejabberd -env ERL_MAX_PORTS 5000 ...

Maximum Number of Erlang Processes: +P
Erlang consumes a lot of lightweight processes. If there is a lot of activity on ejabberd so that the maximum number of proccesses is reached, people will experiment greater latency times. As these processes are implemented in Erlang, and therefore not related to the operating ** processes, you do not have to worry about allowing a huge number of them.

erl -s ejabberd +P 250000 ...

ERL_FULLSWEEP_AFTER: Maximum number of collections before a forced fullsweep
The ERL_FULLSWEEP_AFTER option shrinks the size of the Erlang process after RAM intensive **s. Note that this option may downgrade performance. Hence this option is only interesting on machines that host other services (web**, mail) on which ejabberd does not receive constant load.

erl -s ejabberd -env ERL_FULLSWEEP_AFTER 0 ...

Kernel Polling: +K true

The kernel polling option requires that you have support for it in your kernel. By default, Erlang currently supports kernel polling under FreeBSD, Mac OS X, and Solaris. If you use Linux, check this newspost. Additionaly, you need to enable this feature while compiling Erlang.

From Erlang documentation -> Basic Applications -> erts -> erl -> ** Flags:

+K true|false

Enables or disables the kernel poll functionality if the emulator has kernel poll support. By default the kernel poll; functionality is disabled. If the emulator doesn't have kernel poll support and the +K flag is p**ed to the emulator, a warning is issued at startup.

If you meet all requirements, you can enable it in this way:

erl -s ejabberd +K true ...

Mnesia Tables to Disk
By default, ejabberd uses Mnesia as its database. In Mnesia you can configure each table in the database to be stored on RAM, on RAM and on disk, or only on disk. You can configure this in the web inte**ce: Nodes -> 'mynode' -> DB Management. Modification of this option will consume some memory and CPU time.
Number of Concurrent ETS and Mnesia Tables: ERL_MAX_ETS_TABLES
The number of concurrent ETS and Mnesia tables is limited. When the limit is reached, errors will appear in the logs:

** Too many db tables **

You can safely increase this limit when starting ejabberd. It impacts memory consumption but the difference will be quite small.

erl -s ejabberd -env ERL_MAX_ETS_TABLES 20000 ...

小结:很多问题是很绕的,要多方面考虑验证。




已邀请:

要回复问题请先登录注册