Implementing ssh and scp serving with libwebsockets

The many layers of ssh

Recently I wrote a protocol plugin for libwebsockets that implemented an ssh server: this is cross-platform but in the first case runs on ESP32. I wasn’t expecting it to be simple, but since I only planned to implement the best crypto rather than all options, it seemed like it should be manageable.

It did prove manageable, but getting something able to come up on a vitrual pty and act like a normal ssh session required a pretty hairy amount of implementation, even though I could rely on BSD-licensed bits of mbedtls and OpenSSH for crypto primitive pieces.

Although I generally could have described how SSH works before embarking on this, the gritty details are quite interesting and involve a lot of stuff I had no idea about. And as a special bonus I’ll describe the scp protocol, which it turns out I really had no idea about how it actually works.

SSH Formal Definition

SSH is described in a bunch of RFCs, these are the main ones

RFC4250 SSH Assigned Numbers
RFC4251 Architetcure
RFC4252 Authentication
RFC4253 Transport Layer
RFC4254 Connection Protocol Key exchange protocol – references

The protocol is very well designed it seems to me, and it was interesting that stuff like transmit windows, and connection muxing found in SSH has much later appeared in HTTP/2.

Despite it is largely well-documented, for some ambiguities I had to study the openssh sources and / or watch what the openssh client wanted to do to figure out the whole flow.


The negotiation proceeds through specific stages

  • 1: Version exchange (unencrypted)
  • 2: Crypto suite negotiation (unencrypted)
  • 3: Key exchange
  • 4: User authentication
  • 5: Channel requests

Step 1: Version exchange

The first move on each side is to send a short string confirming that each side can talk a version of SSH that the peer can communicate with. The string must begin with SSH-2.0, afterwards is an opaque application / version string with no special format. For example on the OpenSSH server on my machine, it’s


These strings are kept by each side along with a lot of other information sent and received later in the negotiation for use in a ‘shared secret’ hash used later.

Step 2: Crypto suite Negotiation

The next move is both sides issue lists of what crypto they support and are willing to use. The packet is like this, unencrypted:

Type Meaning
byte[16] cookie (random bytes)
name-list kex_algorithms
name-list server_host_key_algorithms
name-list encryption_algorithms_client_to_server
name-list encryption_algorithms_server_to_client
name-list mac_algorithms_client_to_server
name-list mac_algorithms_server_to_client
name-list compression_algorithms_client_to_server
name-list compression_algorithms_server_to_client
name-list languages_client_to_server
name-list languages_server_to_client
boolean first_kex_packet_follows
uint32 0 (reserved for future extension)

The crypto algorithms are well-known strings, like They are defined to be listed in order of preference by each side.

Because it’s unencrypted, it’s possible for an intermediary to mess with this part of the negotiation. The “man-in-the-middle” can’t downgrade the netgotiation to crypto that both sides are not already willing to use, but it can downgrade the negotation to the crappiest crypto each side is willing to use, by removing or corrupting the better options from this packet.

So there is a lesson here already, disable crappy crypto in all your ssh servers. For openssh, you can specify which KEX, ciphers and MACs are allowed, by editing /etc/ssh/sshd_config to include this:

    MACs hmac-sha2-512

For safety, when doing this to a remote server, leave a second logged-in ssh session to the server active when you edit the config file and restart the ssh server, so you can recover if there are problems. Existing ssh sessions do not get closed when sshd restarts or dies.

In my ssh server implementation, only one set of crypto is supported:

Function Crypto
Server host key ssh-rsa
MAC (implicit in chacha20)
Compression none

These are all currently considered safe choices, with suitable key sizes (I support 4Kbit RSA keys).

Both sides issue their lists, and each side chooses the first matching crypto string from both sides (or fails the negotiation if no matches for everything).

Assuming there is some common ground for each part, then each side must send an SSH_MSG_NEWKEYS packet to mark the point that communication in that direction is switching to the selected cipher so we can move on to the Key Exchange part. From the point each side sends SSH_MSG_NEWKEYS then communication is encrypted.

Step 3: Key Exchange (KEX)

Once the sides have explained their capabilities and arrived at a mutually usable suite of crypto, the next move is to set up some “ephemeral keys” with which to perform the rest of the crypto key exchange.

The choice of KEX is intimately connected to historic doubts about the “NIST curves” required for use with RFC5656, the “offical” Elliptic Curve Crypto KEX method. The affected curves are any with the name “nist” in them, and the affected KEX protocol names begin “ecdsa-”. These are widely considered to be unsafe.

In response to what became generally assumed about parts of RFC5656 being unsafe due to unexplained magic in the ECC computation + selection effectively backdooring ssh communication using it, an alternative ECC standard roughly following RFC5656 but using a different curve, eliminating the unexplained magic and slightly streamlining the implementation was very rapidy produced in KEX protocol, which is widely considered a safe choice.

It’s this KEX method my implementation supports. The flow is:

  • Both sides generate their own ephemeral 256-bit public and private curve25519 key.

  • The client sends SSH_MSG_KEX_ECDH_INIT along with his ephemeral public key.

  • The server computes a “shared secret” using ECC

  • The server generates a hash from the concatenation of various elements available to both sides from the earlier negotiation, and signs the hash with the “shared secret”

  • The server returns SSH_MSG_KEX_ECDH_REPLY along with the server’s ephemeral public key and its non-ephemeral ‘server key’

  • The client also computes the “shared secret” and generates the same concatenated set of elements and the server did and hashes it: this is used to validate the server’s signature on the hash. If all is well the client accepts the connection.

The actual information in the data hashed by both sides to form the “exchange hash” consists of:

      string   V_C, client's identification string (CR and LF excluded)
      string   V_S, server's identification string (CR and LF excluded)
      string   I_C, payload of the client's SSH_MSG_KEXINIT
      string   I_S, payload of the server's SSH_MSG_KEXINIT
      string   K_S, server's public host key
      string   Q_C, client's ephemeral public key octet string
      string   Q_S, server's ephemeral public key octet string
      mpint    K,   shared secret

After both sides accept the KEX, both sides:

  • have the peer’s public key
  • know the peer has the private key matching the public key they sent
  • have the exchange hash (which hashed the “shared secret” that was never explicitly sent)

The client is also able to apply checks to the server’s public key, eg, to see if it matches the key it was given last time it connected to the same hostname.

Further hashes concatenating on the exchange hash is then used by both sides to initialize the actual crypto algorithm, which is different from used to get us this far. In our case, we only support The list of initializations using hashes on the exchange hash is

  • Initial IV client to server: HASH(K || H || “A” || session_id) (Here K is encoded as mpint and “A” as byte and session_id as raw data. “A” means the single character A, ASCII 65).

  • Initial IV server to client: HASH(K || H || “B” || session_id)

  • Encryption key client to server: HASH(K || H || “C” || session_id)

  • Encryption key server to client: HASH(K || H || “D” || session_id)

  • Integrity key client to server: HASH(K || H || “E” || session_id)

  • Integrity key server to client: HASH(K || H || “F” || session_id)

At this point, the negotiated crypto algorithm is initialized, the KEX algorithm is done and the KEX instantiation can be destroyed.

Finally, after all this effort, each side sends a SSH_MSG_NEWKEYS indicating to the peer that the sender is implementing the crypto algorithm and keys from now on, ie, is transitioning to an encrypted channel.

Step 4: User authentication

The KEX got us to the point we can talk in an encrypted channel. But it did nothing about authenticating the client to the server. A malicious client can get this far, same as any browser will set up a TLS channel before authentication with the website.

The next step is the client sends SSH_MSG_USERAUTH_REQUEST… this contains a method name field which may be publickey, password, hostbased or none. In my implementation only publickey is supported, and only the key algorithm ssh-rsa… these are the most common keys in use today and key size may be 4096 bits. It also indicates the user name on the server it is trying to authenticate with the client key, and which service the client wants from the server.

“ssh-rsa” and the client’s public key is sent along with the packet. If the server sees nothing wrong so far, he will respond with SSH_MSG_USERAUTH_PK_OK and echo back the public key type and the public key blob itself… it does this to make it unambiguous as to which SSH_MSG_USERAUTH_REQUEST it is responding to, since the client may pipeline several.

The client then collates a bunch of concatenated data which both sides have access to

  string    session identifier
  string    user name
  string    service name
  string    "publickey"
  boolean   TRUE
  string    public key algorithm name
  string    public key to be used for authentication

and signs the hash of it with its private RSA key. Lastly it sends the SSH_MSG_USERAUTH_REQUEST again, this time with the computed signature attached.

The server can use the client’s public RSA key to confirm it has the matching private key and the signature checks out. If so, it responds with SSH_MSG_USERAUTH_SUCCESS and the authentication is completed.

At this point the server may send SSH_MSG_USERAUTH_BANNER with some “motd” type text. Logging into my ESP32 device over ssh gives this banner:

|\---/|  Secure Wireless Serial Interface: ID 05D769
| o_o |  SSH Terminal Server
 \_^_/   Copyright (C) 2017 Crash Barrier Ltd

Step 5: Channel requests

Now the link is encrypted and the client using the link has been authenticated, the client is allowed to ask for a wider range of things from the server.

ssh is a very flexible protocol, but the most typical request is for a “terminal” via an ssh client. First the client must acquire a “channel”, using SSH_MSG_CHANNEL_OPEN. In ssh, one authenticated link may have multiple channels of different types operating within it with unambiguous multiplexing due to each channel having a channel index number assigned at open time. The channels also have a “tx window” budget associated with them, they are given a certain amount they can send when they are opened, and the remote peer must allow them more using an explicit SSH_MSG_CHANNEL_WINDOW_ADJUST message telling them how much more they may transmit.

Both the multiplexing and tx window concept turned up many years later in the definition of HTTP/2. This is notable because in a not very alternate universe we would not have a web based on TLS + HTTP but we could have had HTTP/2 features many years earlier with a web built on ssh protocol.

The “type” of the channel decides on the meaning of the data sent on the channel; different types of channel send completely different protocol data inside. Defined channel requests are:

  • pty-req: pseudo-tty
  • x11-req: x11 tunnel
  • env: environment variables
  • shell: spawn a server shell with stdin/out/err wired to ssh
  • exec: execute server process with stdin/out/err wired to ssh
  • subsystem: run a defined subsystem, eg, sftp
  • window-change: size of the client window has changed
  • xon-xoff: soft flow control
  • signal: send a signal to server, eg, SIGINT
  • exit-status: retreive exit status of previous “exec” command
  • exit-signal: find out if previous “exec” command died on a signal

For ssh being used as a terminal, the client must ask for a pty-req type of channel, where pty is a Pseudo-TtY or logical terminal emulation channel. When established, this channel passes a complex terminal emulation protocol.

The ssh client also then passes env requests to configure a few environment variables, and then a shell request to wire the ssh channel up to a server shell.

In my case I handle these commands but the ssh connection is actually backed by a UART. So there is no actual shell spawned, and the environment vars are ignored. Instead the UART ringbuffers are wired up to the ssh channel and the remote ssh client sends and receives on that instead.


After all this was working for ssh client connections, I also wanted to support simple file transfers over scp, since that is the most “natural” way to communicate with the remote side for sending files.

There’s very little documentation of how that is supposed to work.

Running scp abc root@mydevice:/def opens a channel and requests to exec on it scp -t /def.

On a real server, it would run scp, but the -t flag is not documented. On ESP32, there is no shell or scp process that can run. After accepting the request and setting a flag on the channel to say it is in “scp mode”, scp sent us some textual “headers” down the channel to set up the transfer; looking at the openssh scp sources I found the format is (mmmm is an octal file mode like 0755)

  • “Dmmmm 0 dirname” - start of copy directory level
  • “E” - end of copy directory level
  • “Cmmmm length filename” - start copy file
  • “Tmtime 0 atime 0” - modification and access times for file

For a simple scp abc root@mydevice:/def, scp sends only the C command, a terminating \x0a and then the payload of the file abc. Then it sends SSH_MSG_CHANNEL_EOF to which we respond with SSH_MSG_CHANNEL_CLOSE to end the connection cleanly.

The implementation is complicated a bit by having to deal with RX flow control due to the small UART ringbuffers, but lws helps a lot here.

What did we learn this time

  • SSH protocol was way ahead of its time

  • SSH crypto and functionality instead of http + ssl tunnel would have gotten us http/2 from the start

  • It’s possible to implement selected “best of breed” crypto suite elements in a very constrained device

  • Libwebsockets + bytewise state machines can implement everything needed (in my case this also includes in-browser JS terminal backed by wss)

  • Implementing this as a lws protocol handler means it can easily coexist in a single event loop; on very small targets like ESP32 this means it can be implemented painlessly.

  • Although lws already supports “natural” (for developers and users) protocols like TLS + https and wss (secure websockets), this is the first time to my knowledge something as “natural” as ssh has been implemented on a constrained target like ESP32. Using a wireless device via ssh and scp from your terminal using normal ssh keys and with the same level of security expected from a server ssh connection is very convenient.