gamescapades

Hello there. I hope you and your loved ones are safe in these disturbing times.

I've been working from home since early March now. Currently this work mostly involves writing Python code for simulations and modelling of data. My laptop is fine for the initial experimentation, but at some point I need more computational power to do 'big runs' of my models. That's when I connect to a machine in my office by typing http://localhost:9999 and I get a full jupyter lab session that's running on the office machine.

What? I can access a remote machine by writing local host? Yes. This is the magic of SSH tunnels.

Maybe you already use TeamViewer or similar software and don't see the point of this exercise. Well if that works for you, great. But the advantage of this method is that much less information has to travel over the interwebs. With TeamViewer you send the entire picture of the desktop so it can often be unresponsive, especially if you're on a bad connection. With SSH tunnels, results can still be slow to arrive, but at least the image won't become huge pixels where you can't see the text anymore :)

In case you don't care for long explanations there's a TL;DR towards the end.

Some prerequisites
Running an app locally
Accessing an app remotely
Multiple tunnels
Summary and more possibilities
TLDR
Troubleshooting

Some prerequisites

If you've never heard of SSH this tutorial might be tricky. But if you've ever used GitHub to store your code you've probably already used SSH. However I will assume basic knowledge of SSH here, so if you lack that basic knowledge, you can for example check out this tutorial at Digitalocean and then come back :)

Another concept you need to be familiar with is that of ports. Most computers and networks today have firewalls so you cannot access them without opening ports, or doors, into their juicy innards. Web sites are accessed over ports 80 and 443 so if you run a web server those ports need to be open. To access a computer by SSH, the default is port 22. Digitalocean generally has great tutorials, and here's one about firewalls and opening ports for SSH.

Oh, and our main prerequisite is that you are able to access this remote machine over the interwebs via SSH. We will also look at the case where we cannot access the target machine directly, but have to 'jump' through another machine first. That's what I have to do; I first SSH into a publicly available machine on the university network, and from there, I can then access my office machine with another SSH connection.

I will also switch between using the terms machine and computer, FYI.

Running an app locally

Before launching a site on the interwebs, we usually develop it locally first. Blog applications like Jekyll have a built in web server that automatically reloads any changes you make. So when I'm writing this blog post, I have a browser window open pointed to http://localhost:4000. Every time I save, Jekyll automatically rebuilds the site and I can check out what my post looks like now and then while writing.

The localhost part means your local computer. This is usually interchangeable with the IP addresses 127.0.0.1 and 0.0.0.0. I say usually because there are some exceptions but those are not really important for our purposes today. Just be aware of it if you see those IPs when you're poking around getting these things to work.

And the 4000 is the port Jekyll has attached itself to. So localhost:4000 means you tell your browser to "go connect to my local computer, find door 4000 and there shall be magic sights there".

If you have Python installed, you can test this very easily by opening your terminal and change to any directory you want, like the one where you store all your cat pictures. Then you can start a web server from there:

cd whatever/folder/where/you/have/cats
python -m http.server

It will most likely tell you it's now serving HTTP on port 8000. Go to http://localhost:8000 and you should see a list of all those cat pictures. This nifty little python command is very convenient for testing what we will be doing next, in case you don't have or need JupyterLab. You can select what port will be used by specifying it at the end of the command, like so:

python -m http.server 12345

Many applications these days use a browser interface, and a very common one in the Python world is JupyterLab. It's not my personal favourite when it comes to code editors, but it's very convenient to have it running on the beefy machine in the office and I can access it from the comfort of my office desk by just pointing my web browser to the beefy machine (we call it the Beast, because it's got beastly powers. Well, it did a few years ago, it's actually not that powerful compared to what's available today, but let's not hurt its feelings. It will always be the Beast to us.).

But what if I'm not on the same network, or there are firewall restrictions, or I'm far away in the comfort of my home?

Accessing an app remotely

The magic of SSH is that we can connect through its port, and bind local ports on the remote machine to local ports on the computer you're sitting at.

This is going to be more tricky to keep track of pretty soon, so let's recap what was just stated. Every computer has its own local ports. They are local to that computer. What SSH can do is that it can connect a local port on some other computer to a local port on your computer. No need to open any more ports than SSH!

Okay! Open a terminal window, ssh into the remote machine and start JupyterLab. We need it running or there's nothing to connect to, and especially for testing it's nice to have it running so we can see if someone (like ourselves) connect.

JupyterLab by default runs on port 8888. So what we are going to do is bind the port 9999 on my laptop to 8888 on the Beast. This is also called port forwarding. Like forwarding mail if you've moved temporarily, you put a temporary address to your temporary house^[1].

Diagram showing localhost:9999 on laptop connected to localhost:8888 on the beast over SSH. Created with mermaid

Now open a new terminal window. Assuming you connect to your remote machine over the default SSH port, all the magic happens in a single line in your terminal:

ssh -f -N -L localhost:9999:localhost:8888 user@remotemachine

Let's go through those commands we give ssh one by one.

-f means we tell ssh to go to the background. We don't really want to do anything on the remote machine other than set up the port, so this way we can issue more commands in this terminal window or just close it to keep our workspace clean.

The -N command means we don't execute any commands on the "other side" of our connection. Again, we only need to set up the ports so we don't need to execute commands on the remote machine.

And then we have -L which is the command to bind our ports and we give it the instructions to bind localhost:9999:localhost:8888.

If you now go to localhost:9999 in your computer's browser you should see JupyterLab running on the remote machine! (Or whatever application it is you might be running) If not, is the application actually running on the remote machine? Forgetting that step is the usual error for me :) Otherwise, try going through the steps again from the start.

Crystal clear? Super green? Good. Things will get more complicated now.

Multiple tunnels

Now, let's say our Beast machine cannot be accessed over the interwebs. But it can be accessed on the internal network of our company or university. And there is some other machine at our place of work, a machine that is both open to the public web and the internal network.

So to do anything on our Beast we have to first - let's call this computer Unicorn - login to Unicorn. From there we SSH into the Beast and start our JupyterLab session.

Now, we can only bind/forward ports on our local machine, so we need to do this in two steps. First we create a tunnel to the Beast, and then we can do the same connection we did in the previous section. Let's look at it step by step.

Multiple tunnels: step 1

Diagram showing localhost:12345 on laptop forwarding to beast:22 on Unicorn over SSH. Created with mermaid

The first step is we bind a local port on our home machine that works as a shorthand to connect to the Beast as if we are on Unicorn. What we then have is a local port that automagically connects directly to the Beast through Unicorn.

ssh -f -N -L localhost:12345:beast:22 user@unicorn

So what we do here is not actually connecting to the Beast, we only say that localhost:12345 - meaning port 12345 on our laptop - can be used to connect to the Beast via ssh. If you run the above line, exchanging the values for your own addresses and users, you'll see that you only get asked for the password for user@unicorn. (If you're using SSH keys this is not as clear, it'll just work if everything went well.)

To test that this works, you should now be able to connect to the Beast through the local port we created:

ssh -p 12345 beastuser@localhost

Here's where things might become confusing, because as you see you have to use your Beast user account @localhost. Which is because our local port 12345 is now actually our forwarding address for the Beast! If we again use the analogy of forwarding mail to a new house, the tunnel we have created is just the forwarding address. Connecting through that address is like actually sending mail to the new house.

Got it? Great! (If you don't, play around with different port numbers until it clicks)

Multiple tunnels: step 2

Cool. Almost there now. This final step is very similar to what we did in the "single hop" scenario where we can connect directly to the Beast through SSH. Thanks to the tunnel we setup in step 1, we now have the same situation, only with a slight twist:

ssh -f -N -L localhost:9999:localhost:8888 -p 12345 beastuser@localhost

Everything is using localhost now! You said a "slight" twist! What is going on?!

Calm down. You get this if we just look at it in parts.

So the last part, -p 12345 beastuser@localhost is the same as when we did the test connection in the previous step, right? We connect to the Beast using our new forwarding address localhost:12345.

And by doing that, we can access the local ports on the Beast, in this case 8888 where JupyterLab is running. And just like before, we bind our home computer/laptop port 9999 to the Beast's 8888.

In other words, now we can use JupyterLab from the comfort of our own browser!

Diagram showing localhost:9999 on laptop forwarding to beast:8888 through Unicorn server. The direct arrow is our end result and the upper arrow is the path we actually take. Created with mermaid

Summary and more possibilities

So now you know how to tunnel like a mole. Pretty nifty, huh?

These basics allow for more uses. Just three examples:

Mount the filesystem of the remote server and browse it as if you're using a local disk using SSHFS (SSH file system)
Use graphical applications of a Linux app using X forwarding
Use a cheap cloud server in another country and tunnel through to watch local video streams

There's also the concept of reverse SSH connections, in cases the machine you need remote access to is behind a firewall and there are no other servers on the network you can SSH into. I might write a guide for that in the future, depending on how eager I am to increase my post/design ratio.

Hope you find all this useful. I do almost every day!

TLDR

App running on port 8888 on a remote machine can be accessed on your local machine's port 9999 with:

ssh -f -N -L localhost:9999:localhost:8888 user@remotemachine

In case you need to multihop through a "jump server" we need the extra step:

ssh -f -N -L localhost:12345:remotemachine:22 user@jumpserver

Followed by:

ssh -f -N -L localhost:9999:localhost:8888 -p 12345 remotemachineuser@localhost

Troubleshooting

If your internet connection drops, the ports and bindings are sometimes not cleaned up properly. You'll then be unable to access your application through the browser, but when you try to establish the SSH tunnel you get errors saying the port is in use.

On Mac and Linux you can issue the following command to check currently running ssh processes:

ps aux | grep ssh

If you have no active ssh sessions this will most likely output only two lines, one saying something about ssh-agent which is a background process keeping track of your ssh-keys if you need them. The other would be saying something about grep because it shows the command we just issued.

But if you do have active connections these should be listed here, and their process ID should be the second column from the left. Meaning you can kill those processes manually with:

kill 12345

If you have 'multi-hopped' then you can save yourself a few keyboard taps by killing the process of the first port binding in the chain, the others should drop automatically.

Yes, nobody sends mail anymore, but I think analogies tied to the real world are easier to handle than the abstractness of digital things. ↩︎

Access apps like Jupyter notebook remotely using SSH tunnels

Contents