Why I don't believe Ruby FCGI can beat LSAPI (Benchmark Ruby LSAPI vs FCGI) 3
I simply couldn't believe it when I saw a benchmark result that FCGI beats LSAPI, because we have spent quite amount of time optimizing our LSAPI protocol and implementation, it is much faster than FastCGI protocol according to our benchmark. Then how much faster LSAPI can be? Here is our benchmark for LSWS Enterprise.
This time we only tested Ruby FCGI and LSAPI alone, just simple CGI scripts, no Rails framework involved.
Our test environment is same as the one used in previous post except that we booted our test server into a non-SMP kernel, so only one CPU is used. Why? Because LSAPI is so fast, our simple test script cannot max out all the CPU power, and there are about 20% idle CPU during the test, while almost all CPU power has been used during FCGI test, only have 0.5-1% idle CPU. When we booted into a non-SMP kernel, only 1 CPU is used, which has been maxed out during both tests, the result should make more sense.
Test scripts:
testlsapi.rb
#!/usr/local/bin/ruby
require 'lsapi'
while LSAPI.accept != nil
print "HTTP/1.0 200 OK\r\nContent-type: text/html\r\n\r\nHello, World!\n"
end
testfcgi.rb
#!/usr/local/bin/ruby
require "fcgi"
FCGI.each {|request|
out = request.out
out.print "Content-Type: text/html\r\n\r\nHello, World!\n"
request.finish
}
Both FCGI and LSAPI has been configured to start 10 instances. Here is the result:
Ruby LSAPI:
$ ab -n 100000 -c 100 http://192.168.0.60:8080/testlsapi
...
Server Software: LiteSpeed
Server Hostname: 192.168.0.60
Server Port: 8080
Document Path: /testlsapi
Document Length: 14 bytes
Concurrency Level: 100
Time taken for tests: 13.600 seconds
Complete requests: 100000
Failed requests: 0
Broken pipe errors: 0
Total transferred: 15300459 bytes
HTML transferred: 1400042 bytes
Requests per second: 7352.94 [#/sec] (mean)
Time per request: 13.60 [ms] (mean)
Time per request: 0.14 [ms] (mean, across all concurrent requests)
Transfer rate: 1125.03 [Kbytes/sec] received
Connnection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.2 0 14
Processing: 6 13 2.9 13 280
Waiting: 1 13 2.9 12 280
Total: 6 13 2.9 13 283
Percentage of the requests served within a certain time (ms)
50% 13
66% 13
75% 13
80% 14
90% 15
95% 17
98% 18
99% 19
100% 283 (last request)
FCGI:
$ab -n 100000 -c 100 http://192.168.0.60:8080/testfcgi
...
Server Software: LiteSpeed
Server Hostname: 192.168.0.60
Server Port: 8080
Document Path: /testfcgi
Document Length: 14 bytes
Concurrency Level: 100
Time taken for tests: 20.069 seconds
Complete requests: 100000
Failed requests: 0
Broken pipe errors: 0
Total transferred: 15300153 bytes
HTML transferred: 1400014 bytes
Requests per second: 4982.81 [#/sec] (mean)
Time per request: 20.07 [ms] (mean)
Time per request: 0.20 [ms] (mean, across all concurrent requests)
Transfer rate: 762.38 [Kbytes/sec] received
Connnection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 3
Processing: 16 19 2.7 18 195
Waiting: 16 19 2.7 18 195
Total: 16 19 2.8 18 195
Percentage of the requests served within a certain time (ms)
50% 18
66% 18
75% 19
80% 20
90% 23
95% 27
98% 27
99% 27
100% 195 (last request)
Ruby LSAPI is about 50% faster than Ruby FCGI for the simple "Hello, World" test.
How about other web servers' FCGI engine? OK, let's try the same test with nginx then.
nginx configuration:
...
upstream fcgi {
server unix:/tmp/fcgi1.sock;
server unix:/tmp/fcgi2.sock;
server unix:/tmp/fcgi3.sock;
server unix:/tmp/fcgi4.sock;
server unix:/tmp/fcgi5.sock;
server unix:/tmp/fcgi6.sock;
server unix:/tmp/fcgi7.sock;
server unix:/tmp/fcgi8.sock;
server unix:/tmp/fcgi9.sock;
server unix:/tmp/fcgi0.sock;
}
...
location /testfcgi {
fastcgi_pass fcgi;
include conf/fastcgi_params;
}
...
Result:
$ ab -n 100000 -c 100 http://192.168.0.60:80/testfcgi
...
Server Software: nginx/0.4.0
Server Hostname: 192.168.0.60
Server Port: 80
Document Path: /testfcgi
Document Length: 14 bytes
Concurrency Level: 100
Time taken for tests: 21.317 seconds
Complete requests: 100000
Failed requests: 0
Broken pipe errors: 0
Total transferred: 13500000 bytes
HTML transferred: 1400000 bytes
Requests per second: 4691.09 [#/sec] (mean)
Time per request: 21.32 [ms] (mean)
Time per request: 0.21 [ms] (mean, across all concurrent requests)
Transfer rate: 633.30 [Kbytes/sec] received
Connnection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 6
Processing: 6 21 3.3 20 47
Waiting: 2 20 3.3 19 47
Total: 6 21 3.3 20 47
Percentage of the requests served within a certain time (ms)
50% 20
66% 20
75% 20
80% 20
90% 26
95% 31
98% 31
99% 31
100% 47 (last request)
Don't believe my test result, try it yourself!
My Rails Benchmark results: LSWS + LSAPI vs nginx + FCGI 6
After reading Scoundrel's blog "Looking for Optimal Solution: Benchmark Results Summary and Findings", I am pretty surprised that LSWS + LSAPI falls behind. The best performer in his tests is nginx + FCGI. "All right, let me spend some time benchmarking them." I said to myself.
Here is our test environment:
Server: Dell PowerEdge SC1425
CPU: Dual Intel Xeon 3.0GHz/800MHz FSB/2MB L2 Cache
Memory: 1024MB DDR2 400MHz(2X512MB) ECC DIMM
Hard Drive: 80GB 7200RPM SATA drive
NIC: on board Intel PRO/1000 Gigabit Adapter
Cent OS: CentOS 4.2
Client: Custom Built
Motherboard: MSI K7D
CPU: Dual Athlon MP 2000+/266FSB/256KB L2 Cache
Memory: 1GB PC2100
Hard Drive: 36GB 1000RPM SCSI drive
NIC: Intel PRO/1000 Gigabit
OS: RedHat 8.0/Kernel 2.4.22smp
Benchmark Tool: ApacheBench (ab) 1.3.33
We found that ab in Apache 2.x is not as fast as that in Apache 1.3.x release.
On the server we installed the latest software packages, they are: Ruby: 1.8.5 Rails: 1.1.6 Ruby-LSAPI: 1.6 Ruby FCGI: 0.8.7 nginx: 0.4.0 LSWS Enterprise: 2.2
We created a simple "Hello world" application like Scoundrel did, the URL is like
http://192.168.0.60:
Got:
Server Software: LiteSpeed
Server Hostname: 192.168.0.60
Server Port: 8080
Document Path: /hello
Document Length: 47 bytes
Concurrency Level: 100
Time taken for tests: 26.667 seconds
Complete requests: 10000
Failed requests: 0
Broken pipe errors: 0
Total transferred: 2650360 bytes
HTML transferred: 470000 bytes
Requests per second: 375.00 [#/sec] (mean)
Time per request: 266.67 [ms] (mean)
Time per request: 2.67 [ms] (mean, across all concurrent requests)
Transfer rate: 99.39 [Kbytes/sec] received
Connnection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 35 265 90.9 246 1730
Waiting: 31 265 90.9 246 1730
Total: 35 265 90.9 246 1730
Percentage of the requests served within a certain time (ms)
50% 246
66% 257
75% 268
80% 275
90% 300
95% 392
98% 658
99% 696
100% 1730 (last request)
Tried a couple times, the result is pretty consistent, all are around 370 rps.
OK, it is nginx's turn. compile, install and start it. Good, the index page is accessible, it works. Next is to figure out how to configure the Rails FCGI. After digging on the internet for a quite while, finally made it work.
Fist, I need start Rails dispatch.fcgi manually with a FCGI spawning tool, I found that only "cgi-fcgi" coming with FCGI SDK can spawn multiple FCGI instances sharing the same listening socket, which is required for nginx, for this test. I use command
cgi-fcgi -start -connect 127.0.0.1:4001 ./dispatch.fcgi 4
to start 4 instances of dispatch.fcgi, all listen on the same server socket. OK, dispatch.fcgi started. Before start dispatch.fcgi, I remember to do "export RAILS_ENV=production" to make sure dispatch.fcgi will run in production mode.
What about nginx configured. After some digging, I end up with adding following configuration to the default configuration to make it work.
location /hello {
fastcgi_pass 127.0.0.1:4001;
include conf/fastcgi_params;
}
Clear the session cache directory then fire up: ab -n -c http://192.168.0.60:8080/hello
Got
Server Software: nginx/0.4.0
Server Hostname: 192.168.0.60
Server Port: 80
Document Path: /hello
Document Length: 47 bytes
Concurrency Level: 100
Time taken for tests: 35.416 seconds
Complete requests: 10000
Failed requests: 0
Broken pipe errors: 0
Total transferred: 2590000 bytes
HTML transferred: 470000 bytes
Requests per second: 282.36 [#/sec] (mean)
Time per request: 354.16 [ms] (mean)
Time per request: 3.54 [ms] (mean, across all concurrent requests)
Transfer rate: 73.13 [Kbytes/sec] received
Connnection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 8 203 1677.3 20 32021
Waiting: 8 203 1677.2 20 32021
Total: 8 203 1677.4 20 32023
Percentage of the requests served within a certain time (ms)
50% 20
66% 23
75% 26
80% 27
90% 31
95% 43
98% 1170
99% 5857
100% 32023 (last request)
Tried a couple times, the results is not very consistent, falls in the range of 250-330 rps.
OK, let's try UNIX domain socket hoping for better performance, fire up another group of dispatch.fcgi listening at /tmp/rails.sock, change nginx configuration to
location /hello {
fastcgi_pass unix:/tmp/rails.sock;
include conf/fastcgi_params;
}
Restart nginx, and run 'ab'
Server Software: nginx/0.4.0
Server Hostname: 192.168.0.60
Server Port: 80
Document Path: /hello
Document Length: 383 bytes
Concurrency Level: 100
Time taken for tests: 17.883 seconds
Complete requests: 10000
Failed requests: 5835
(Connect: 0, Length: 5835, Exceptions: 0)
Broken pipe errors: 0
Non-2xx responses: 4165
Total transferred: 3735375 bytes
HTML transferred: 1869440 bytes
Requests per second: 559.19 [#/sec] (mean)
Time per request: 178.83 [ms] (mean)
Time per request: 1.79 [ms] (mean, across all concurrent requests)
Transfer rate: 208.88 [Kbytes/sec] received
Connnection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.2 0 10
Processing: 5 177 135.9 257 1632
Waiting: 1 177 136.0 257 1631
Total: 5 177 135.8 258 1632
Percentage of the requests served within a certain time (ms)
50% 258
66% 280
75% 295
80% 302
90% 327
95% 345
98% 372
99% 382
100% 1632 (last request)
It looks good at first glance, but when you take a closer look, you will find that there are many failed requests. Tried a couple times, results are similar.
My conclusion is that using UNIX domain socket with nginx is not reliable at all. I think it is because nginx does not pipeline requests like what LSWS does, but attempts to make 100 concurrent connections to upstream, while upstream is only capable of handling 4 concurrent connections when UNIX domain socket is used. Using TCP socket is more reliable as TCP server socket backlog can buffer a few pending connections, but I bet problem will occur when concurrent level is higher than the backlog buffer.
You may think I am biased, may be, because I knew our product inside out and LSWS has been properly tuned for sure, but for nginx, I can only use the default setting without much tuning.
All right, so much for the benchmarking. It is not possible to have a exactly same test environment and configuration for everyone who does benchmark tests, the result may vary dramatically. So, it is better to do your own benchmark before making decisions, I only trust my own results.
Update:
I benchmark nginx again using configuration post in Alexey Kovyrin's comments below, start 4 dispatch.fcgi listening on 4 UNIX domain sockets.
Here are the results:
$ ab -n 10000 -c 100 http://192.168.0.60/hello
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.0.60 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Finished 10000 requests
Server Software: nginx/0.4.0
Server Hostname: 192.168.0.60
Server Port: 80
Document Path: /hello
Document Length: 47 bytes
Concurrency Level: 100
Time taken for tests: 31.128 seconds
Complete requests: 10000
Failed requests: 0
Broken pipe errors: 0
Total transferred: 2590000 bytes
HTML transferred: 470000 bytes
Requests per second: 321.25 [#/sec] (mean)
Time per request: 311.28 [ms] (mean)
Time per request: 3.11 [ms] (mean, across all concurrent requests)
Transfer rate: 83.20 [Kbytes/sec] received
Connnection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 26
Processing: 5 309 148.4 284 1436
Waiting: 5 309 148.4 284 1436
Total: 5 309 148.4 284 1436
Percentage of the requests served within a certain time (ms)
50% 284
66% 300
75% 313
80% 331
90% 425
95% 456
98% 658
99% 1195
100% 1436 (last request)
$ ab -n 10000 -c 100 http://192.168.0.60/hello
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.0.60 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Finished 10000 requests
Server Software: nginx/0.4.0
Server Hostname: 192.168.0.60
Server Port: 80
Document Path: /hello
Document Length: 47 bytes
Concurrency Level: 100
Time taken for tests: 31.915 seconds
Complete requests: 10000
Failed requests: 0
Broken pipe errors: 0
Total transferred: 2590000 bytes
HTML transferred: 470000 bytes
Requests per second: 313.33 [#/sec] (mean)
Time per request: 319.15 [ms] (mean)
Time per request: 3.19 [ms] (mean, across all concurrent requests)
Transfer rate: 81.15 [Kbytes/sec] received
Connnection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 5 317 155.1 284 1449
Waiting: 5 317 155.1 284 1449
Total: 5 317 155.0 284 1449
Percentage of the requests served within a certain time (ms)
50% 284
66% 298
75% 312
80% 341
90% 424
95% 465
98% 819
99% 1241
100% 1449 (last request)
$ ab -n 10000 -c 100 http://192.168.0.60/hello
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.0.60 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Finished 10000 requests
Server Software: nginx/0.4.0
Server Hostname: 192.168.0.60
Server Port: 80
Document Path: /hello
Document Length: 47 bytes
Concurrency Level: 100
Time taken for tests: 31.057 seconds
Complete requests: 10000
Failed requests: 0
Broken pipe errors: 0
Total transferred: 2590000 bytes
HTML transferred: 470000 bytes
Requests per second: 321.99 [#/sec] (mean)
Time per request: 310.57 [ms] (mean)
Time per request: 3.11 [ms] (mean, across all concurrent requests)
Transfer rate: 83.40 [Kbytes/sec] received
Connnection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 8 305 62.2 288 553
Waiting: 5 305 62.2 288 552
Total: 8 305 62.1 288 553
Percentage of the requests served within a certain time (ms)
50% 288
66% 306
75% 317
80% 335
90% 421
95% 437
98% 453
99% 477
100% 553 (last request)
The results is much better than previous test when UNIX domain socket is used, at least there is no failed requests. OK, it should be good for production use in this way. The result is consistent, at about 320 rps. The best score I got using the TCP socket is about the same, it is still lower than the 370 rps I got with LSWS + LSAPI.
