Implementing ssh and scp serving with libwebsockets
The many layers of ssh
Recently I wrote a protocol plugin for libwebsockets that implemented an ssh server: this is cross-platform but in the first case runs on ESP32. I wasn’t expecting it to be simple, but since I only planned to implement the best crypto rather than all options, it seemed like it should be manageable.
It did prove manageable, but getting something able to come up on a vitrual pty and act like a normal ssh session required a pretty hairy amount of implementation, even though I could rely on BSD-licensed bits of mbedtls and OpenSSH for crypto primitive pieces.
Although I generally could have described how SSH works before embarking on this, the gritty details are quite interesting and involve a lot of stuff I had no idea about. And as a special bonus I’ll describe the scp protocol, which it turns out I really had no idea about how it actually works.
SSH Formal Definition
SSH is described in a bunch of RFCs, these are the main ones
|SSH Assigned Numbers
|Key exchange protocol
|https://email@example.com – references https://tools.ietf.org/html/rfc5656
The protocol is very well designed it seems to me, and it was interesting that stuff like transmit windows, and connection muxing found in SSH has much later appeared in HTTP/2.
Despite it is largely well-documented, for some ambiguities I had to study the openssh sources and / or watch what the openssh client wanted to do to figure out the whole flow.
The negotiation proceeds through specific stages
- 1: Version exchange (unencrypted)
- 2: Crypto suite negotiation (unencrypted)
- 3: Key exchange
- 4: User authentication
- 5: Channel requests
Step 1: Version exchange
The first move on each side is to send a short string confirming that each side
can talk a version of SSH that the peer can communicate with. The string must
SSH-2.0, afterwards is an opaque application / version string with
no special format. For example on the OpenSSH server on my machine, it’s
These strings are kept by each side along with a lot of other information sent and received later in the negotiation for use in a ‘shared secret’ hash used later.
Step 2: Crypto suite Negotiation
The next move is both sides issue lists of what crypto they support and are willing to use. The packet is like this, unencrypted:
|cookie (random bytes)
|0 (reserved for future extension)
The crypto algorithms are well-known strings, like
firstname.lastname@example.org. They are defined to be listed in order of
preference by each side.
Because it’s unencrypted, it’s possible for an intermediary to mess with this part of the negotiation. The “man-in-the-middle” can’t downgrade the netgotiation to crypto that both sides are not already willing to use, but it can downgrade the negotation to the crappiest crypto each side is willing to use, by removing or corrupting the better options from this packet.
So there is a lesson here already, disable crappy crypto in all your ssh servers. For openssh, you can specify which KEX, ciphers and MACs are allowed, by editing
/etc/ssh/sshd_configto include this:
For safety, when doing this to a remote server, leave a second logged-in ssh session to the server active when you edit the config file and restart the ssh server, so you can recover if there are problems. Existing ssh sessions do not get closed when sshd restarts or dies.
In my ssh server implementation, only one set of crypto is supported:
|Server host key
(implicit in chacha20)
These are all currently considered safe choices, with suitable key sizes (I support 4Kbit RSA keys).
Both sides issue their lists, and each side chooses the first matching crypto string from both sides (or fails the negotiation if no matches for everything).
Assuming there is some common ground for each part, then each side must send an
SSH_MSG_NEWKEYS packet to mark the point that communication in that direction is switching to the selected
cipher so we can move on to the Key Exchange part. From the point each side sends
SSH_MSG_NEWKEYS then communication is encrypted.
Step 3: Key Exchange (KEX)
Once the sides have explained their capabilities and arrived at a mutually usable suite of crypto, the next move is to set up some “ephemeral keys” with which to perform the rest of the crypto key exchange.
The choice of KEX is intimately connected to historic doubts about the “NIST curves” required for use with RFC5656, the “offical” Elliptic Curve Crypto KEX method. The affected curves are any with the name “nist” in them, and the affected KEX protocol names begin “ecdsa-”. These are widely considered to be unsafe.
In response to what became generally assumed about parts of RFC5656 being unsafe due to unexplained magic in the ECC computation + selection effectively backdooring ssh communication using it, an alternative ECC standard roughly following RFC5656 but using a different curve, eliminating the unexplained magic and slightly streamlining the implementation was very rapidy produced in
email@example.com KEX protocol, which is widely considered a safe choice.
It’s this KEX method my implementation supports. The flow is:
Both sides generate their own ephemeral 256-bit public and private curve25519 key.
The client sends
SSH_MSG_KEX_ECDH_INITalong with his ephemeral public key.
The server computes a “shared secret” using ECC
The server generates a hash from the concatenation of various elements available to both sides from the earlier negotiation, and signs the hash with the “shared secret”
The server returns
SSH_MSG_KEX_ECDH_REPLYalong with the server’s ephemeral public key and its non-ephemeral ‘server key’
The client also computes the “shared secret” and generates the same concatenated set of elements and the server did and hashes it: this is used to validate the server’s signature on the hash. If all is well the client accepts the connection.
The actual information in the data hashed by both sides to form the “exchange hash” consists of:
string V_C, client's identification string (CR and LF excluded)
string V_S, server's identification string (CR and LF excluded)
string I_C, payload of the client's SSH_MSG_KEXINIT
string I_S, payload of the server's SSH_MSG_KEXINIT
string K_S, server's public host key
string Q_C, client's ephemeral public key octet string
string Q_S, server's ephemeral public key octet string
mpint K, shared secret
After both sides accept the KEX, both sides:
- have the peer’s public key
- know the peer has the private key matching the public key they sent
- have the exchange hash (which hashed the “shared secret” that was never explicitly sent)
The client is also able to apply checks to the server’s public key, eg, to see if it matches the key it was given last time it connected to the same hostname.
Further hashes concatenating on the exchange hash is then used by both sides to initialize the actual crypto algorithm, which is different from
firstname.lastname@example.org used to get us this far. In our case, we only support
email@example.com. The list of initializations using hashes on the exchange hash is
Initial IV client to server: HASH(K || H || “A” || session_id) (Here K is encoded as mpint and “A” as byte and session_id as raw data. “A” means the single character A, ASCII 65).
Initial IV server to client: HASH(K || H || “B” || session_id)
Encryption key client to server: HASH(K || H || “C” || session_id)
Encryption key server to client: HASH(K || H || “D” || session_id)
Integrity key client to server: HASH(K || H || “E” || session_id)
Integrity key server to client: HASH(K || H || “F” || session_id)
At this point, the negotiated crypto algorithm is initialized, the KEX algorithm is done and the KEX instantiation can be destroyed.
Finally, after all this effort, each side sends a
SSH_MSG_NEWKEYS indicating to the peer that the sender is implementing the crypto algorithm and keys from now on, ie, is transitioning to an encrypted channel.
Step 4: User authentication
The KEX got us to the point we can talk in an encrypted channel. But it did nothing about authenticating the client to the server. A malicious client can get this far, same as any browser will set up a TLS channel before authentication with the website.
The next step is the client sends
SSH_MSG_USERAUTH_REQUEST… this contains a
method name field which may be
none. In my implementation only
publickey is supported, and only the key algorithm
ssh-rsa… these are the most common keys in use today and key size may be 4096 bits. It also indicates the user name on the server it is trying to authenticate with the client key, and which service the client wants from the server.
“ssh-rsa” and the client’s public key is sent along with the packet. If the server sees nothing wrong so far, he will respond with SSH_MSG_USERAUTH_PK_OK and echo back the public key type and the public key blob itself… it does this to make it unambiguous as to which SSH_MSG_USERAUTH_REQUEST it is responding to, since the client may pipeline several.
The client then collates a bunch of concatenated data which both sides have access to
string session identifier
string user name
string service name
string public key algorithm name
string public key to be used for authentication
and signs the hash of it with its private RSA key. Lastly it sends the
SSH_MSG_USERAUTH_REQUEST again, this time with the computed signature attached.
The server can use the client’s public RSA key to confirm it has the matching private key and the signature checks out. If so, it responds with
SSH_MSG_USERAUTH_SUCCESS and the authentication is completed.
At this point the server may send
SSH_MSG_USERAUTH_BANNER with some “motd” type text. Logging into my ESP32 device over ssh gives this banner:
|\---/| Secure Wireless Serial Interface: ID 05D769
| o_o | SSH Terminal Server
\_^_/ Copyright (C) 2017 Crash Barrier Ltd
Step 5: Channel requests
Now the link is encrypted and the client using the link has been authenticated, the client is allowed to ask for a wider range of things from the server.
ssh is a very flexible protocol, but the most typical request is for a “terminal” via an ssh client. First the client must acquire a “channel”, using
SSH_MSG_CHANNEL_OPEN. In ssh, one authenticated link may have multiple channels of different types operating within it with unambiguous multiplexing due to each channel having a channel index number assigned at open time. The channels also have a “tx window” budget associated with them, they are given a certain amount they can send when they are opened, and the remote peer must allow them more using an explicit
SSH_MSG_CHANNEL_WINDOW_ADJUST message telling them how much more they may transmit.
Both the multiplexing and tx window concept turned up many years later in the definition of HTTP/2. This is notable because in a not very alternate universe we would not have a web based on TLS + HTTP but we could have had HTTP/2 features many years earlier with a web built on ssh protocol.
The “type” of the channel decides on the meaning of the data sent on the channel; different types of channel send completely different protocol data inside. Defined channel requests are:
- pty-req: pseudo-tty
- x11-req: x11 tunnel
- env: environment variables
- shell: spawn a server shell with stdin/out/err wired to ssh
- exec: execute server process with stdin/out/err wired to ssh
- subsystem: run a defined subsystem, eg, sftp
- window-change: size of the client window has changed
- xon-xoff: soft flow control
- signal: send a signal to server, eg, SIGINT
- exit-status: retreive exit status of previous “exec” command
- exit-signal: find out if previous “exec” command died on a signal
For ssh being used as a terminal, the client must ask for a
pty-req type of channel, where pty is a Pseudo-TtY or logical terminal emulation channel. When established, this channel passes a complex terminal emulation protocol.
The ssh client also then passes
env requests to configure a few environment variables, and then a
shell request to wire the ssh channel up to a server shell.
In my case I handle these commands but the ssh connection is actually backed by a UART. So there is no actual shell spawned, and the environment vars are ignored. Instead the UART ringbuffers are wired up to the ssh channel and the remote ssh client sends and receives on that instead.
After all this was working for ssh client connections, I also wanted to support simple file transfers over scp, since that is the most “natural” way to communicate with the remote side for sending files.
There’s very little documentation of how that is supposed to work.
scp abc root@mydevice:/def opens a channel and requests to
exec on it
scp -t /def.
On a real server, it would run
scp, but the
-t flag is not documented. On ESP32, there is no shell or scp process that can run. After accepting the request and setting a flag on the channel to say it is in “scp mode”, scp sent us some textual “headers” down the channel to set up the transfer; looking at the openssh scp sources I found the format is (mmmm is an octal file mode like 0755)
- “Dmmmm 0 dirname” - start of copy directory level
- “E” - end of copy directory level
- “Cmmmm length filename” - start copy file
- “Tmtime 0 atime 0” - modification and access times for file
For a simple
scp abc root@mydevice:/def, scp sends only the C command, a terminating
\x0a and then the payload of the file
abc. Then it sends
SSH_MSG_CHANNEL_EOF to which we respond with
SSH_MSG_CHANNEL_CLOSE to end the connection cleanly.
The implementation is complicated a bit by having to deal with RX flow control due to the small UART ringbuffers, but lws helps a lot here.
What did we learn this time
SSH protocol was way ahead of its time
SSH crypto and functionality instead of http + ssl tunnel would have gotten us http/2 from the start
It’s possible to implement selected “best of breed” crypto suite elements in a very constrained device
Libwebsockets + bytewise state machines can implement everything needed (in my case this also includes in-browser JS terminal backed by wss)
Implementing this as a lws protocol handler means it can easily coexist in a single event loop; on very small targets like ESP32 this means it can be implemented painlessly.
Although lws already supports “natural” (for developers and users) protocols like TLS + https and wss (secure websockets), this is the first time to my knowledge something as “natural” as
sshhas been implemented on a constrained target like ESP32. Using a wireless device via
scpfrom your terminal using normal ssh keys and with the same level of security expected from a server ssh connection is very convenient.