🐙 Evan Azevedo

Search

Search IconIcon to open search

48 - Failing to use Dask

Last updated Nov 18, 2022 Edit Source

Today I “failed upwards” with using Dask for speeding up my compute. I was making my calls like this:

1
2
3
4
5
from dask.distributed import Client

client = Client()
dd = some_large_dask_task()
dd.compute()

Naturally, when the computation didn’t go as planned I pressed ^C in the terminal, but I noticed in the Dask client server at https://127.0.0.1:8787/status that the workers were not shut down, and a couple were actually running! So I panicked and tried to close them using htop. When that didn’t work I just restarted my whole laptop.

So that continued for a couple of times, until I realized a couple of crucial things:

  1. I couldn’t do much to stop the threads after they had started. However, new ones would not start as long as I called client.close(). So I just needed to close the client and be patient.
  2. Sometimes I would start a client with 8 workers, then I wanted to try it with 4 workers. However, calling client = Client(n_workers=4) would just spawn a new Scheduler and client. So in order to do that I needed to connect to the previous client (which I could do with client = Client('tcp://127.0.0.1:{port})) and then execute client.shutdown().
  3. Observation 2 led to a lot of strange output in the terminal. I could make this simpler by instead starting the client by the following:
1
2
3
4
5
6
from dask.distributed import Client, LocalCluster

cluster = LocalCluster()
client = Client(cluster)
# ... some Dask code
cluster.close(); client.close()
  1. If I knew that I didn’t want to keep the client open, for example if I wanted to run the above in a script, then it would be prudent to use context managers like so:
1
2
with LocalCluster() as cluster, Client(cluster) as client:
# Do some dask process

I'm a freelance software developer located in Denver, Colorado. If you're interested in working together or would just like to say hi you can reach me at me@ this domain.