Why are ssh-agent and ssh-add not working together in one bash script?

Question

I wrote one script where I would like it to start the ssh-agent first in order to run the agent in background, and set the appropriate environment variables for the current shell instance. However, in second part of the script I would also like to add my private SSH key in order to connect to my server.

Currently, neither command in the script is working with each other. Can someone help me properly understand what I am doing wrong?

#!/bin/bash
exec ssh-agent bash
sleep 5s
ssh-add /media/MyUSB/.ssh/id_00123 &

In addition, while utilizing the built-in debugger on bash I can see that only the first section of the script is working (i.e. exec ssh-agent bash).

Kamil Maciorowski · Answer 1 · 2022-08-14T20:31:18.557

So many interesting aspects in such a small script.

Understanding `ssh-agent`

Let's start from what ssh-agent is designed to do. You run ssh-agent when you want a process that sits there, listens on some socket (it's a type of file for two-way interprocess communication) and serves requests from programs like ssh-add or ssh that connect to the socket. Programs will talk to the agent and store, manipulate or use private keys.

Any program that wants to use an agent needs to know the path to the socket the agent listens on. If a program knows the path then it can use the socket to communicate with the agent.

A design decision was once made: any program that wants to know the path to the socket of an authentication agent should check the SSH_AUTH_SOCK variable in its own environment, the value of the variable is the path. It was a decision (I mean things could be designed in some other way, e.g. programs might be designed to accept this path via command-line arguments each time), but it was a very good decision.

It was a very good decision because the environment is by default inherited. This means you need to set the SSH_AUTH_SOCK environment variable for one process (e.g. a shell) and all its descendants will inherit it (unless some of them deliberately choose to alter their environment or to create a child with altered environment). For comparison: passing the path as a command-line argument each time you want to run something that should talk to the agent requires extra typing; and you would want to store the path somewhere, so probably in a variable anyway. So here you go, the name of the variable is standardized and interested programs check it automatically.

Another option was to store the path in a text file in a fixed location, or even to create a socket in a fixed location in the first place. But sometimes you want some programs to use one agent (one socket) and some other programs to use another agent (another socket). Making two programs see different files in the same location is hard. Making two programs see different environment variables is easy.

So, interested programs should check SSH_AUTH_SOCK in their environment. How can we or anything set this variable to the right value in the environment of a process? Without a debugger there are two ways:

Either the parent knows the value and when it spawns a child it sets up SSH_AUTH_SOCK with the right value in the environment for the child (an act of inheriting unchanged SSH_AUTH_SOCK from the parent may be interpreted as "the parent setting this up by doing nothing");
or the process learns the value in some other way and modifies its own environment.

Therefore ssh-agent supports two methods of starting it:

```
ssh-agent command …
```
Here ssh-agent creates a socket and prepares itself to serve future programs connecting to the socket. Then it runs command … as its child with SSH_AUTH_SOCK with the right value in the environment for the child. The child (or any descendant that inherits the variable) can easily find the socket, but other processes not so easily. When the command terminates, so does ssh-agent (even if there are grandchildren).
```
ssh-agent   # but don't use it exactly this way
```
Here ssh-agent forks to the background, i.e. it creates a child copy of itself and does not wait for it to exit. The child detaches from standard streams of the parent and from the terminal, it will not exit by itself. The child will be the real agent that will stay. The parent will exit by itself, but before it happens, shell code is printed. The shell code, when evaluated by a shell, makes the shell modify its own environment, so SSH_AUTH_SOCK with the right value is put there. But the shell has to evaluate the output, not just run ssh-agent, so the right way is like:
```
eval "$(ssh-agent)"
```
After this the shell that run eval has the right variable (in fact: variables) in its environment and from now on commands like ssh-add run from this shell will find the agent because they will inherit the variable. Exiting the shell does not terminate the agent, so at some point before exiting the shell you may want to invoke ssh-agent -k (or, if you also want to unset the variables: eval "$(ssh-agent -k)"). An agent for which there is no process holding the right value of SSH_AUTH_SOCK is virtually useless.

What is wrong with your script

And now – finally – to your script. This is your script:

#!/bin/bash
exec ssh-agent bash
sleep 5s
ssh-add /media/MyUSB/.ssh/id_00123 &

The first thing the script does is exec ssh-agent bash. exec tells the shell interpreting the script to replace itself with the command, which is ssh-agent bash. The shell does it and becomes ssh-agent that starts a new bash (it's the method 1 from above). This bash holds the right value of SSH_AUTH_SOCK, it is interactive, it prints a prompt and allows you to run commands (including commands that need SSH_AUTH_SOCK). If your original interactive shell was bash then you may miss the fact you're now in a separate bash. You may interpret the existence of SSH_AUTH_SOCK as confirmation that ssh-agent has modified the environment of your original shell. No, you're still in the middle of your script.

Well, not exactly in the middle. If you exit this bash, then sleep and the rest won't be executed, because the shell interpreting the script has replaced itself with ssh-agent. In some sense you're one exit before the end of the script.

If your method to run the script was like ./myscript, then exit will put you back in the original shell. If your method was like . ./myscript or source myscript then exit will act as if you exited the original shell, because the original shell was the shell interpreting the script and has replaced itself with ssh-agent that is about to exit upon your exit from the current shell; this can strengthen the impression you were in (and now you are exiting) the original shell.

The fix

In the question you have explicitly stated your goal:

[…] appropriate environment variables for the current shell instance. […]

To modify the environment of the current shell, the script has to use the method 2 from above. The current shell has to be the interpreting shell, i.e. the script has to be sourced. The shell mustn't exec to anything, as you don't want the shell to be replaced by anything. Example fix:

#!/usr/bin/false
[ -n "$SSH_AUTH_SOCK" ] || eval "$(ssh-agent)"
ssh-add /media/MyUSB/.ssh/id_00123

There's more what has been improved:

#!/usr/bin/false as the shebang ensures the script will do nothing and fail if you (inadvertently) run it instead of sourcing it. Other strategies are here: Strategy for forgetting to run a script with source. Without a shebang or with a shebang pointing to bash, sh or another compatible shell, the script executed (not sourced) would start a new agent, add the key to it and exit. All this without affecting the environment of your current shell, so the agent would sit there in vain, almost inaccessible. You would need to put some effort in finding and killing it, or some effort in finding its socket and manually setting SSH_AUTH_SOCK in the environment of your shell; or you would just let it be. false as the shebang prevents this inconvenient scenario.
[ -n "$SSH_AUTH_SOCK" ] checks if $SSH_AUTH_SOCK expands to a non-empty string. An empty string indicates there is no agent available, while a non-empty string indicates there is probably an agent. The script starts a new ssh-agent only if the string is empty. This is a basic precaution against a scenario where you (inadvertently) source the script for the second time, create a new authentication agent and lose variables associated with the previous agent that will keep running uselessly.
There's no need to sleep. ssh-agent in our script exits when the agent (i.e. its child in the background) is ready. You can ssh-add right away.
ssh-add is here as a synchronous command. Running it asynchronously (with &, like you tried to do) is probably not going to save you a lot of time. You can try. But you will most likely source the script from an interactive shell with job control enabled and thus & (if you put it there) will pollute your terminal with a message like [1]+ Done ….

Why are ssh-agent and ssh-add not working together in one bash script?

1 Answers1

Understanding `ssh-agent`

What is wrong with your script

The fix

Linked

Why are ssh-agent and ssh-add not working together in one bash script?

1 Answers1

Understanding ssh-agent

What is wrong with your script

The fix

Linked

Understanding `ssh-agent`