Deployment
**********


Application server
==================

The author of "websockets" isn't aware of best practices for deploying
network services based on "asyncio", let alone application servers.

You can run a script similar to the server example, inside a
supervisor if you deem that useful.

You can also add a wrapper to daemonize the process. Third-party
libraries provide solutions for that.

If you can share knowledge on this topic, please file an issue.
Thanks!


Graceful shutdown
=================

You may want to close connections gracefully when shutting down the
server, perhaps after executing some cleanup logic. There are two ways
to achieve this with the object returned by "serve()":

* using it as a asynchronous context manager, or

* calling its "close()" method, then waiting for its "wait_closed()"
  method to complete.

On Unix systems, shutdown is usually triggered by sending a signal.

Here's a full example (Unix-only):

   #!/usr/bin/env python

   import asyncio
   import signal
   import websockets

   async def echo(websocket, path):
       async for message in websocket:
           await websocket.send(message)

   async def echo_server(stop):
       async with websockets.serve(echo, "localhost", 8765):
           await stop

   loop = asyncio.get_event_loop()

   # The stop condition is set when receiving SIGTERM.
   stop = loop.create_future()
   loop.add_signal_handler(signal.SIGTERM, stop.set_result, None)

   # Run the server until the stop condition is met.
   loop.run_until_complete(echo_server(stop))

It's more difficult to achieve the same effect on Windows. Some third-
party projects try to help with this problem.

If your server doesn't run in the main thread, look at
"call_soon_threadsafe()".


Memory usage
============

In most cases, memory usage of a WebSocket server is proportional to
the number of open connections. When a server handles thousands of
connections, memory usage can become a bottleneck.

Memory usage of a single connection is the sum of:

1. the baseline amount of memory "websockets" requires for each
   connection,

2. the amount of data held in buffers before the application
   processes it,

3. any additional memory allocated by the application itself.


Baseline
--------

Compression settings are the main factor affecting the baseline amount
of memory used by each connection.

By default "websockets" maximizes compression rate at the expense of
memory usage. If memory usage is an issue, lowering compression
settings can help:

* Context Takeover is necessary to get good performance for almost
  all applications. It should remain enabled.

* Window Bits is a trade-off between memory usage and compression
  rate. It defaults to 15 and can be lowered. The default value isn't
  optimal for small, repetitive messages which are typical of
  WebSocket servers.

* Memory Level is a trade-off between memory usage and compression
  speed. It defaults to 8 and can be lowered. A lower memory level can
  actually increase speed thanks to memory locality, even if the CPU
  does more work!

See this example for how to configure compression settings.

Here's how various compression settings affect memory usage of a
single connection on a 64-bit system, as well a benchmark of
compressed size and compression time for a corpus of small JSON
documents.

+---------------+---------------+----------------+----------------+--------------------+--------------------+
| Compression   | Window Bits   | Memory Level   | Memory usage   | Size vs. default   | Time vs. default   |
|===============|===============|================|================|====================|====================|
| *default*     | 15            | 8              | 325 KiB        | +0%                | +0%                |
|               |               |                |                |                    |                    |
|               |               |                |                |                    |                    |
+---------------+---------------+----------------+----------------+--------------------+--------------------+
|               | 14            | 7              | 181 KiB        | +1.5%              | -5.3%              |
+---------------+---------------+----------------+----------------+--------------------+--------------------+
|               | 13            | 6              | 110 KiB        | +2.8%              | -7.5%              |
+---------------+---------------+----------------+----------------+--------------------+--------------------+
|               | 12            | 5              | 73 KiB         | +4.4%              | -18.9%             |
+---------------+---------------+----------------+----------------+--------------------+--------------------+
|               | 11            | 4              | 55 KiB         | +8.5%              | -18.8%             |
+---------------+---------------+----------------+----------------+--------------------+--------------------+
| *disabled*    | N/A           | N/A            | 22 KiB         | N/A                | N/A                |
+---------------+---------------+----------------+----------------+--------------------+--------------------+

*Don't assume this example is representative! Compressed size and
compression time depend heavily on the kind of messages exchanged by
the application!*

You can run the same benchmark for your application by creating a list
of typical messages and passing it to the "_benchmark" function.

This blog post by Ilya Grigorik provides more details about how
compression settings affect memory usage and how to optimize them.

This experiment by Peter Thorson suggests Window Bits = 11, Memory
Level = 4 as a sweet spot for optimizing memory usage.


Buffers
-------

Under normal circumstances, buffers are almost always empty.

Under high load, if a server receives more messages than it can
process, bufferbloat can result in excessive memory use.

By default "websockets" has generous limits. It is strongly
recommended to adapt them to your application. When you call
"serve()":

* Set "max_size" (default: 1 MiB, UTF-8 encoded) to the maximum size
  of messages your application generates.

* Set "max_queue" (default: 32) to the maximum number of messages
  your application expects to receive faster than it can process them.
  The queue provides burst tolerance without slowing down the TCP
  connection.

Furthermore, you can lower "read_limit" and "write_limit" (default: 64
KiB) to reduce the size of buffers for incoming and outgoing data.

The design document provides more details about buffers.


Port sharing
============

The WebSocket protocol is an extension of HTTP/1.1. It can be tempting
to serve both HTTP and WebSocket on the same port.

The author of "websockets" doesn't think that's a good idea, due to
the widely different operational characteristics of HTTP and
WebSocket.

"websockets" provide minimal support for responding to HTTP requests
with the "process_request()" hook. Typical use cases include health
checks. Here's an example:

   #!/usr/bin/env python

   # WS echo server with HTTP endpoint at /health/

   import asyncio
   import http
   import websockets

   async def health_check(path, request_headers):
       if path == "/health/":
           return http.HTTPStatus.OK, [], b"OK\n"

   async def echo(websocket, path):
       async for message in websocket:
           await websocket.send(message)

   start_server = websockets.serve(
       echo, "localhost", 8765, process_request=health_check
   )

   asyncio.get_event_loop().run_until_complete(start_server)
   asyncio.get_event_loop().run_forever()
