Data Scientist
Edinburgh, Scotland
Managing your data science project environments with Conda (+pip)
View
Dr. David R. Pugh is a staff scientist with the King Abdullah University of Science and Technology (KAUST) Research Computing Core Labs where he provides data science training and consulting services to KAUST students, faculty, and research scientists. David is a certified Software and Data Carpentry instructor with extensive teaching experience having taught Software and Data Carpentry workshops in Japan, Saudia Arabia, and the UK.
-
Managing your data science project environments with Conda (+pip)
Description
This workshop is a Software Carpentry-style introduction to Conda (+pip) for (data) scientists. Conda is an open source package and environment management system that runs on Windows, Mac OS and Linux. Although Conda was created for Python packages, Conda can package and distribute software for any language (which makes Conda well suited to manage environments for data science and machine learning projects!). Pip is the de facto standard package-management system used to install and manage software packages written in Python: if it's written in Python, then it will be available on the [Python Package Index (PyPI) via pip. Conda and pip work great as a team and this workshop will cover when and how to use pip to install packages into Conda environments.
This workshop motivates the use of Conda (+pip) as a development tool for building and sharing project specific software environments that facilitate reproducible (data) science workflows. Particular attention is given to using Conda to create reproducible environments with NVIDIA GPU dependencies (including environments for Horovod, TensorFlow, PyTorch, and NVIDIA RAPIDS).
Participants should have a basic familiarity with Python programming and Bash shell concepts (i.e., basic commands, environment variables, etc).