This is the scariest commit I've done in a long time. This is the astobj2-ification of chan_sip. I've tested a number of scenarios like crazy. It used to have 4x the call setup/teardown performance of trunk, but now it's roughly at parity. I will attempt to find the bottlenecks and get it back to the 4x mark. The changes made were somewhat invasive, but the value to the community of these upgrades outweighs waiting further for more testing. Every change being made to chan_sip was lousing this code up when we tried to merge. Peers, Users, Dialogs, are all now astobj2 objects, indexed via hashtables. Refcounting is used to track objects and free them at the bitter end of their lives. Please file issues on bugs.digium.com, and PLEASE, please, please be patient. One natural advantage to all the hash-table work is that loading large sip.conf files full of thousands of peers now goes much faster. One more please: PLEASE help thrash this code and test it.

git-svn-id: http://svn.digium.com/svn/asterisk/trunk@114190 f38db490-d61c-443f-a65b-d21fe96a405b
2008-04-16 23:53:27 +00:00 · 2008-04-16 23:53:27 +00:00 · 993e45a63b
parent d9fc402428
commit 993e45a63b
4 changed files with 1684 additions and 640 deletions
--- a/7
+++ b/7
@ -35,6 +35,13 @@ SIP Changes
 * The ATTENDED_TRANSFER_COMPLETE_SOUND can now be set using setvar to cause a given
   audio file to be played upon completion of an attended transfer.
 * Added DNS manager support to registrations for peers referencing peer entries.
+ * Performance improvements via using hash tables (astobj2) and doubly-linked lists to improve 
+   load/reload of large numbers of peers/users by ~40x (for large lists of peers.
+   Initially, we saw 4x improvement in call setup/destruction, but at the time
+   of merging, this gain has disappeared; further research will be done to try
+   and restore this performance improvement. Astobj2 refcounting is now used
+   for users, peers, and dialogs.  Users are encouraged to assist in regression
+   testing and problem reporting!

 IAX Changes
 -----------
--- a/channels/chan_sip.c
+++ b/channels/chan_sip.c
--- a/configs/sip.conf.sample
+++ b/configs/sip.conf.sample
@ -67,7 +67,21 @@ context=default			; Default context for incoming calls
 ;match_auth_username=yes	; if available, match user entry using the
 				; 'username' field from the authentication line
 				; instead of the From: field.
-				
+;;
+;; hash table sizes. For maximum efficiency, adjust the following
+;; values to be slightly larger than the maximum number of users/peers.
+;; Too large, and space is wasted. Too small, and things will run slower.
+;; 563 is probably way too big for small (home) applications, but it
+;; should cover most small/medium sites.
+;; it is recommended to make the sizes be a prime number!
+;; This was internally set to 17 for small-memory applications...
+;; All tables default to 563, except when compiled in LOW_MEMORY mode,
+;; in which case, they default to 17. You can override this by uncommenting
+;; the following, and changing the values.
+;hash_users=563
+;hash_peers=563
+;hash_dialogs=563
+
 allowoverlap=no			; Disable overlap dialing support. (Default is yes)
 ;allowtransfer=no		; Disable all transfers (unless enabled in peers or users)
 				; Default is enabled
@ -126,7 +140,7 @@ srvlookup=yes			; Enable DNS SRV lookups on outbound calls
 				; Disabling DNS SRV lookups disables the 
 				; ability to place SIP calls based on domain 
 				; names to some other SIP users on the Internet
-				
+
 ;domain=mydomain.tld		; Set default domain for this host
 				; If configured, Asterisk will only allow
 				; INVITE and REFER to non-local domains
--- a/doc/chan_sip-perf-testing.txt
+++ b/doc/chan_sip-perf-testing.txt
@ -0,0 +1,110 @@
+Measuring the SIP channel driver's Performance
+==============================================
+
+This file documents the methods I used to measure
+the performance of the SIP channel driver, in 
+terms of maximum simultaneous calls and how quickly
+it could handle incoming calls.
+
+Knowing these limitations can be valuable to those
+implementing PBX's in 'large' environments. Will your
+installation handle expected call volume?
+
+Quoting these numbers can be totally useless for other
+installations. Minor changes like the amount of RAM
+in a system, the speed of the ethernet, the amount of
+cache in the CPU, the CPU clock speed, whether or not
+you log CDR's, etc. can affect the numbers greatly.
+
+In my set up, I had a dedicated test machine running Asterisk,
+and another machine which ran sipp, connected together with
+ethernet.
+
+The version of sipp that I used was sipp-2.0.1; however, 
+I have reason to believe that other versions would work 
+just as well.
+
+On the asterisk machine, I included the following in my
+extensions.ael file:
+
+context test11
+{
+        s => {
+                Answer();
+                while (1) {
+                        Background(demo-instruct);
+                }
+                Hangup();
+        }
+        _X. => {
+                Answer();
+                while (1) {
+                        Background(demo-instruct);
+                }
+                Hangup();
+        }
+}
+
+Basically, incoming SIP calls are answered, and
+the demo-instruct sound file is played endlessly
+to the caller. This test depends on the calling
+party to hang up, thus allowing sipp to determine
+the length of a call.
+
+The sip.conf file has this entry:
+
+[asterisk02]
+type=friend
+context=test11
+host=192.168.134.240 ;; the address of the host you will be running sipp on
+user=sipp
+canreinvite=no
+disallow=all
+allow=ulaw
+
+Note that it's pretty simplistic; no authentication beyond the host ip, 
+and it uses ulaw, which is pretty efficient, low-cpu-intensive codec.
+
+
+To measure the impact of incoming call traffic on the Asterisk
+machine, I run vmstat. It gives me an idea of the cpu usage by 
+Asterisk. The most common failure mode of Asterisk at high call volumes,
+is that the CPU reaches 100% utilization, and then cannot keep up with
+the workload, resulting in timeouts and other failures, which swiftly 
+compound and cascade, until gross failure ensues. Watch the CPU Idle % 
+numbers.
+
+I learned to split the testing into two modes: one for just call call processing
+power, in the which we had relatively few simultaneous calls in place,
+and another where we allow the the number of simultaneous calls to quickly 
+reach a set maximum, and then rerun sipp, looking for the maximum.
+
+Call processing power is measured with extremely short duration calls:
+
+    ./sipp -sn uac 192.168.134.252 -s 12 -d 100 -l 256
+
+The above tells sipp to call your asterisk test machine (192.168.134.252)
+at extension 12, each call lasts just .1 second, with a limit of 256 simultaneous 
+calls. The simultaneous calls will be the rate/sec of incoming calls times the call length,
+so 1 simultaneous call at 10 calls/sec, and 45 at 450 calls/sec. Setting the limit
+to 256 implies you do not intend to test above 2560 calls/sec.
+
+Sipp starts at 10 calls/sec, and you can slowly increase the speed by hitting '*' or '+'.
+Watch your cpu utilization on the asterisk server. When you approach 100%, you have found 
+your limit.
+
+
+Simultaneous calls can be measured with very long duration calls:
+
+./sipp -sn uac 192.168.134.252 -s 12 -d 100000 -l 270
+
+This will place 100 sec duration calls to Asterisk. The number of simultaneous
+calls will increase until the maximum of 270 is reached. If Asterisk survives
+this number and is not at 100% cpu utilization, you can stop sipp and run it again
+with a higher -l argument.
+
+
+By changing one Asterisk parameter at a time, you can get a feel for how much that change
+will affect performance. 
+
+