From aa0f5b38aec14428b4b80e06f90ff781f8bca5f1 Mon Sep 17 00:00:00 2001 From: Rene Mayrhofer Date: Mon, 22 May 2006 05:12:18 +0000 Subject: Import initial strongswan 2.7.0 version into SVN. --- doc/src/performance.html | 576 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 576 insertions(+) create mode 100755 doc/src/performance.html (limited to 'doc/src/performance.html') diff --git a/doc/src/performance.html b/doc/src/performance.html new file mode 100755 index 000000000..9d90acc62 --- /dev/null +++ b/doc/src/performance.html @@ -0,0 +1,576 @@ + + + + FreeS/WAN performance + + + + + +

Performance of FreeS/WAN

+The performance of FreeS/WAN is adequate for most applications. + +

In normal operation, the main concern is the overhead for encryption, +decryption and authentication of the actual IPsec (ESP and/or AH) +data packets. Tunnel setup and rekeying occur so much less frequently than +packet processing that, in general, their overheads are not worth worrying +about.

+ +

At startup, however, tunnel setup overheads may be significant. If you +reboot a gateway and it needs to establish many tunnels, expect some delay. +This and other issues for large gateways are discussed below.

+ +

Published material

+ +

The University of Wales at Aberystwyth has done quite detailed speed tests +and put their +results on the web.

+ +

Davide Cerri's thesis (in +Italian) includes performance results for FreeS/WAN and for TLS. He posted an English +summary on the mailing list.

+ +

Steve Bellovin used one of AT&T Research's FreeS/WAN gateways as his +data source for an analysis of the cache sizes required for key swapping in +IPsec. Available as text +or PDF +slides for a talk on the topic.

+ +

See also the NAI work mentioned in the next section.

+ +

Estimating CPU overheads

+ +

We can come up with a formula that roughly relates CPU speed to the rate +of IPsec processing possible. It is far from exact, but should be usable as a +first approximation.

+ +

An analysis of authentication overheads for high-speed networks, including +some tests using FreeS/WAN, is on the NAI +Labs site. In particular, see figure 3 in this PDF +document. Their estimates of overheads, measured in Pentium II cycles per +byte processed are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
IPsecauthenticationencryptioncycles/byte
Linux IP stack alonenonono5
IPsec without cryptoyesnono11
IPsec, authentication onlyyesSHA-1no24
IPsec with encryptionyesyesyesnot tested
+ +

Overheads for IPsec with encryption were not tested in the NAI work, but +Antoon Bosselaers' web page gives +cost for his optimised Triple DES implementation as 928 Pentium cycles per +block, or 116 per byte. Adding that to the 24 above, we get 140 cycles per +byte for IPsec with encryption.

+ +

At 140 cycles per byte, a 140 MHz machine can handle a megabyte -- 8 +megabits -- per second. Speeds for other machines will be proportional to +this. To saturate a link with capacity C megabits per second, you need a +machine running at C * 140/8 = C * 17.5 MHz.

+ +

However, that estimate is not precise. It ignores the differences +between:

+ + +

and does not account for some overheads you will almost certainly have:

+ + +

so we suggest using C * 25 to get an estimate with a bit of a +built-in safety factor.

+ +

This covers only IP and IPsec processing. If you have other loads on your +gateway -- for example if it is also working as a firewall -- then you will +need to add your own safety factor atop that.

+ +

This estimate matches empirical data reasonably well. For example, +Metheringham's tests, described below, show a 733 +topping out between 32 and 36 Mbit/second, pushing data as fast as it can +down a 100 Mbit link. Our formula suggests you need at least an 800 to handle +a fully loaded 32 Mbit link. The two results are consistent.

+ +

Some examples using this estimation method:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
InterfaceMachine speed in MHz
TypeMbit per
+ second
Estimate
+ Mbit*25
Minimum IPSEC gatewayMinimum with other load + +

(e.g. firewall)

+
DSL125 MHzwhatever you have133, or better if you have it
cable modem375 MHz
any link, light load5125 MHz133200+, almost any surplus machine
Ethernet10250 MHzsurplus 266 or 300500+
fast link, moderate load20500 MHz500800+, any current off-the-shelf PC
T3 or E3451125 MHz12001500+
fast Ethernet1002500 MHz// not feasible with 3DES in + software on current machines //
OC31553875 MHz
+ +

Such an estimate is far from exact, but should be usable as minimum +requirement for planning. The key observations are:

+ +

Higher performance alternatives

+ +

AES is a new US government block cipher + standard, designed to replace the obsolete DES. If FreeS/WAN using 3DES is not fast enough for your application, + the AES patch may help.

+ +

To date (March 2002) we have had only one mailing + list report of measurements with the patch applied. It indicates that, + at least for the tested load on that user's network, AES roughly + doubles IPsec throughput. If further testing confirms this, it may + prove possible to saturate an OC3 link in software on a high-end box.

+ +

Also, some work is being done toward support of hardware IPsec acceleration which might + extend the range of requirements FreeS/WAN could meet.

+ +

Other considerations

+ +

CPU speed may be the main issue for IPsec performance, but of course it + isn't the only one.

+ +

You need good ethernet cards or other network interface hardware to get + the best performance. See this ethernet + information page and this Linux + network driver page.

+ +

The current FreeS/WAN kernel code is largely single-threaded. It is SMP + safe, and will run just fine on a multiprocessor machine (discussion), but the load within the + kernel is not shared effectively. This means that, for example to saturate + a T3 -- which needs about a 1200 MHz machine -- you cannot expect something + like a dual 800 to do the job.

+ +

On the other hand, SMP machines do tend to share loads well so -- + provided one CPU is fast enough for the IPsec work -- a multiprocessor + machine may be ideal for a gateway with a mixed load.

+ +

Many tunnels from a single gateway

+ +

FreeS/WAN allows a single gateway machine to build tunnels to many + others. There may, however, be some problems for large numbers as indicated + in this message from the mailing list:

+ +
Subject: Re: Maximum number of ipsec tunnels?
+   Date: Tue, 18 Apr 2000
+   From: "John S. Denker" <jsd@research.att.com>
+
+Christopher Ferris wrote:
+
+>> What are the maximum number ipsec tunnels FreeS/WAN can handle??
+
+Henry Spencer wrote:
+
+>There is no particular limit.  Some of the setup procedures currently
+>scale poorly to large numbers of connections, but there are (clumsy)
+>workarounds for that now, and proper fixes are coming.
+
+1) "Large" numbers means anything over 50 or so.  I routinely run boxes
+with about 200 tunnels.  Once you get more than 50 or so, you need to worry
+about several scalability issues:
+
+a) You need to put a "-" sign in syslogd.conf, and rotate the logs daily
+not weekly.
+
+b) Processor load per tunnel is small unless the tunnel is not up, in which
+case a new half-key gets generated every 90 seconds, which can add up if
+you've got a lot of down tunnels.
+
+c) There's other bits of lore you need when running a large number of
+tunnels.  For instance, systematically keeping the .conf file free of
+conflicts requires tools that aren't shipped with the standard freeswan
+package.
+
+d) The pluto startup behavior is quadratic.  With 200 tunnels, this eats up
+several minutes at every restart.   I'm told fixes are coming soon.
+
+2) Other than item (1b), the CPU load depends mainly on the size of the
+pipe attached, not on the number of tunnels.
+
+ +

It is worth noting that item (1b) applies only to repeated attempts to +re-key a data connection (IPsec SA, Phase 2) over an established keying +connection (ISAKMP SA, Phase 1). There are two ways to reduce this overhead +using settings in ipsec.conf(5):

+ + +

The overheads for establishing keying connections (ISAKMP SAs, Phase 1) +are lower because for these Pluto does not perform expensive operations +before receiving a reply from the peer.

+ +

A gateway that does a lot of rekeying -- many tunnels and/or low settings +for tunnel lifetimes -- will also need a lot of random numbers from the random(4) driver.

+ +

Low-end systems

+ +

Even a 486 can handle a T1 line, according to this mailing list +message:

+
Subject: Re: linux-ipsec: IPSec Masquerade
+   Date: Fri, 15 Jan 1999 11:13:22 -0500
+   From: Michael Richardson 
+
+. . . A 486/66 has been clocked by Phil Karn to do
+10Mb/s encryption.. that uses all the CPU, so half that to get some CPU,
+and you have 5Mb/s. 1/3 that for 3DES and you get 1.6Mb/s....
+ +

and a piece of mail from project technical lead Henry Spencer:

+
Oh yes, and a new timing point for Sandy's docs...  A P60 -- yes, a 60MHz
+Pentium, talk about antiques -- running a host-to-host tunnel to another
+machine shows an FTP throughput (that is, end-to-end results with a real
+protocol) of slightly over 5Mbit/s either way.  (The other machine is much
+faster, the network is 100Mbps, and the ether cards are good ones... so
+the P60 is pretty definitely the bottleneck.)
+ +

From the above, and from general user experience as reported on the list, +it seems clear that a cheap surplus machine -- a reasonable 486, a minimal +Pentium box, a Sparc 5, ... -- can easily handle a home office or a small +company connection using any of:

+ + +

If available, we suggest using a Pentium 133 or better. This should ensure +that, even under maximum load, IPsec will use less than half the CPU cycles. +You then have enough left for other things you may want on your gateway -- +firewalling, web caching, DNS and such.

+ +

Measuring KLIPS

+ +

Here is some additional data from the mailing list.

+
Subject: FreeSWAN (specically KLIPS) performance measurements
+   Date: Thu, 01 Feb 2001
+   From: Nigel Metheringham <Nigel.Metheringham@intechnology.co.uk>
+
+I've spent a happy morning attempting performance tests against KLIPS 
+(this is due to me not being able to work out the CPU usage of KLIPS so 
+resorting to the crude measurements of maximum throughput to give a 
+baseline to work out loading of a box).
+
+Measurements were done using a set of 4 boxes arranged in a line, each 
+connected to the next by 100Mbit duplex ethernet.  The inner 2 had an 
+ipsec tunnel between them (shared secret, but I was doing measurements 
+when the tunnel was up and running - keying should not be an issue 
+here).  The outer pair of boxes were traffic generators or traffic sink.
+
+The crypt boxes are Compaq DL380s - Uniprocessor PIII/733 with 256K 
+cache.  They have 128M main memory.  Nothing significant was running on 
+the boxes other than freeswan.  The kernel was a 2.2.19pre7 patched 
+with freeswan and ext3.
+
+Without an ipsec tunnel in the chain (ie the 2 inner boxes just being 
+100BaseT routers), throughput (measured with ttcp) was between 10644 
+and 11320 KB/sec
+
+With an ipsec tunnel in place, throughput was between 3268 and 3402 
+KB/sec
+
+These measurements are for data pushed across a TCP link, so the 
+traffic on the wire between the 2 ipsec boxes would have been higher 
+than this....
+
+vmstat (run during some other tests, so not affecting those figures) on 
+the encrypting box shows approx 50% system & 50% idle CPU - which I 
+don't believe at all.  Interactive feel of the box was significantly 
+sluggish.
+
+I also tried running the kernel profiler (see man readprofile) during 
+test runs.
+
+A box doing primarily decrypt work showed basically nothing happening - 
+I assume interrupts were off.
+A box doing encrypt work showed the following:-
+ Ticks Function                                   Load
+ ~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    ~~~~~~
+   956 total                                      0.0010
+   532 des_encrypt2                               0.1330
+   110 MD5Transform                               0.0443
+    97 kmalloc                                    0.1880
+    39 des_encrypt3                               0.1336
+    23 speedo_interrupt                           0.0298
+    14 skb_copy_expand                            0.0250
+    13 ipsec_tunnel_start_xmit                    0.0009
+    13 Decode                                     0.1625
+    11 handle_IRQ_event                           0.1019
+    11 .des_ncbc_encrypt_end                      0.0229
+    10 speedo_start_xmit                          0.0188
+     9 satoa                                      0.0225
+     8 kfree                                      0.0118
+     8 ip_fragment                                0.0121
+     7 ultoa                                      0.0365
+     5 speedo_rx                                  0.0071
+     5 .des_encrypt2_end                          5.0000
+     4 _stext                                     0.0140
+     4 ip_fw_check                                0.0035
+     2 rj_match                                   0.0034
+     2 ipfw_output_check                          0.0200
+     2 inet_addr_type                             0.0156
+     2 eth_copy_and_sum                           0.0139
+     2 dev_get                                    0.0294
+     2 addrtoa                                    0.0143
+     1 speedo_tx_buffer_gc                        0.0024
+     1 speedo_refill_rx_buf                       0.0022
+     1 restore_all                                0.0667
+     1 number                                     0.0020
+     1 net_bh                                     0.0021
+     1 neigh_connected_output                     0.0076
+     1 MD5Final                                   0.0083
+     1 kmem_cache_free                            0.0016
+     1 kmem_cache_alloc                           0.0022
+     1 __kfree_skb                                0.0060
+     1 ipsec_rcv                                  0.0001
+     1 ip_rcv                                     0.0014
+     1 ip_options_fragment                        0.0071
+     1 ip_local_deliver                           0.0023
+     1 ipfw_forward_check                         0.0139
+     1 ip_forward                                 0.0011
+     1 eth_header                                 0.0040
+     1 .des_encrypt3_end                          0.0833
+     1 des_decrypt3                               0.0034
+     1 csum_partial_copy_generic                  0.0045
+     1 call_out_firewall                          0.0125
+
+Hope this data is helpful to someone... however the lack of visibility 
+into the decrypt side makes things less clear
+ +

Speed with compression

+ +

Another user reported some results for connections with and without IP +compression:

+
Subject: [Users] Speed with compression
+   Date: Fri, 29 Jun 2001
+   From: John McMonagle <johnm@advocap.org>
+
+Did a couple tests with compression using the new 1.91 freeswan.
+
+Running between 2 sites with cable modems.  Both  using approximately
+130 mhz pentium.
+
+Transferred files with ncftp.
+
+Compressed file was a 6mb compressed  installation file.
+Non compressed was 18mb /var/lib/rpm/packages.rpm
+
+                            Compressed vpn          regular vpn
+Compress file                42.59 kBs               42.08 kBs
+regular file                110.84 kBs               41.66 kBs
+
+Load  was about 0 either way.
+Ping times were very similar  a bit above 9 ms.
+
+Compression looks attractive to me.
+Later in the same thread, project technical lead Henry Spencer added: +
> is there a reason not to switch compression on?  I have large gateway boxes
+> connecting 3 connections, one of them with a measly DS1 link...
+
+Run some timing tests with and without, with data and loads representative
+of what you expect in production.  That's the definitive way to decide. 
+If compression is a net loss, then obviously, leave it turned off.  If it
+doesn't make much difference, leave it off for simplicity and hence
+robustness.  If there's a substantial gain, by all means turn it on. 
+
+If both ends support compression and can successfully negotiate a
+compressed connection (trivially true if both are FreeS/WAN 1.91), then
+the crucial question is CPU cycles. 
+
+Compression has some overhead, so one question is whether *your* data
+compresses well enough to save you more CPU cycles (by reducing the volume
+of data going through CPU-intensive encryption/decryption) than it costs
+you.  Last time I ran such tests on data that was reasonably compressible
+but not deliberately contrived to be so, this generally was not true --
+compression cost extra CPU cycles -- so compression was worthwhile only if
+the link, not the CPU, was the bottleneck.  However, that was before the
+slow-compression bug was fixed.  I haven't had a chance to re-run those
+tests yet, but it sounds like I'd probably see a different result. 
+The bug he refers to was a problem with the compression libraries that had us +using C code, rather than assembler, for compression. It was fixed before +1.91. + +

Methods of measuring

+ +

If you want to measure the loads FreeS/WAN puts on a system, note that +tools such as top or measurements such as load average are more-or-less +useless for this. They are not designed to measure something that does most +of its work inside the kernel.

+ +

Here is a message from FreeS/WAN kernel programmer Richard Guy Briggs on +this:

+
> I have a batch of boxes doing Freeswan stuff.
+> I want to measure the CPU loading of the Freeswan tunnels, but am 
+> having trouble seeing how I get some figures out...
+> 
+>  - Keying etc is in userspace so will show up on the per-process
+>    and load average etc (ie pluto's load)
+
+Correct.
+
+>  - KLIPS is in the kernel space, and does not show up in load average
+>    I think also that the KLIPS per-packet processing stuff is running
+>    as part of an interrupt handler so it does not show up in the
+>    /proc/stat system_cpu or even idle_cpu figures
+
+It is not running in interrupt handler.  It is in the bottom half.
+This is somewhere between user context (careful, this is not
+userspace!) and hardware interrupt context.
+
+> Is this correct, and is there any means of instrumenting how much the 
+> cpu is being loaded - I don't like the idea of a system running out of 
+> steam whilst still showing 100% idle CPU :-)
+
+vmstat seems to do a fairly good job, but use a running tally to get a
+good idea.  A one-off call to vmstat gives different numbers than a
+running stat.  To do this, put an interval on your vmstat command
+line.
+and another suggestion from the same thread: +
Subject: Re: Measuring the CPU usage of Freeswan
+   Date: Mon, 29 Jan 2001
+   From: Patrick Michael Kane <modus@pr.es.to>
+ 
+The only truly accurate way to accurately track FreeSWAN CPU usage is to use
+a CPU soaker. You run it on an unloaded system as a benchmark, then start up
+FreeSWAN and take the difference to determine how much FreeSWAN is eating.
+I believe someone has done this in the past, so you may find something in
+the FreeSWAN archives.  If not, someone recently posted a URL to a CPU
+soaker benchmark tool on linux-kernel.
+ + -- cgit v1.2.3