Between October 9th-11th, I visited the PyCon DE & PyData conference in Berlin. Organized by volunteers of the Python community, the joint conference this year covered three days full of interesting talks and tutorials around the Python universe — with topics ranging from code debugging and testing, machine learning applications, visualization, data science project management, and DevOps. Here are a few take-aways that I gained during the conference:
- Tools usage such as Apache Airflow for scheduling and monitoring workflows has been one important topic in the field of deployment. To illustrate the idea: Airflow uses DAGs (Directed Acyclic Graphs) to order tasks such as data loading, data storage or emails processing and to specify dependencies between them. By providing a set of operators which perform the tasks, Airflow represents a flexible tool for deploying diverse workflows.
- Standardize processes as much as possible but automate wisely. If the time spent for refining automation exceeds the time saved, particularly if the task is not executed regularly, then keeping a less efficient task instead of automation might make more sense.
- While a number of libraries make implementing machine learning algorithms quite handy, the pre-processing step of feature engineering such as dimensionality reduction, transformation, or feature creation still is mostly done manually. By focusing on different datatypes (e.g., relational or time-series data), libraries such as featuretools and tsfresh are useful for feature engineering automation.
- Use source control and experiment versioning to ensure reproducibility. A good documentation belongs to every data science project, both to help our future self and our team members.
- Code debug is an essential part of programming. Although writing clean code might be the easiest way to avoid debugging, standard libraries and most IDEs offer helpful debugger tools. Moreover, writing minimal, reproducible tests helps to develop correct and sustainable code. Code testing might either cover pure coding or data-specific definitions, with workflows and tests ranging from statements assertion to visualizations.
- The Python open-source community needs us! Many libraries that we use both in our daily work and spare time projects depend on volunteers who spend time to further develop the Python environment.
To sum up, the conference constituted a great experience. Many more insights could be listed here: I spoke to programmers who use Python in diverse contexts — from research projects to applications provided for companies. I got to know more about how important the open-source community is for the further development of Python as a flexible and multifaceted programming language. I also exchanged experiences with participants of the PyLadies lunch meeting. The Python event in Berlin provided a great learning platform that benefited me as a data scientist at Ginkgo Analytics. With it I travelled back to Hamburg with a bunch of new impressions and ideas.