Apr 29, 2024
3 Minutes

PyCon & PyData 2024: Key Learnings and Takeaways

Predict42 data scientists at PyCon Berlin! We learned project mgmt tools, data validation (pyDVL), LLM integration (Haystack/Langchain), functional programming & new tools (Streamlit, Polars) to boost our work. Stay tuned for exciting advancements! #PyCon #DataScience

PyCon & PyData 2024: Key Learnings and Takeaways

Three members of the Predict42 data science team attended PyCon & PyData 2024 in Berlin from April 22-24. This blog post summarizes the key takeaways and potential applications for our work at Predict42.

Here is a quick slide show

Project Structure and Development Practices

Modern Tooling: We learned about valuable tools for Python development, including:

  • pyproject.toml for streamlined project setup.
  • Project management tools like Hatch (alternatives to be evaluated).
  • Static type checking with mypy.
  • Clear and informative README.md files

Testing: The importance of unit tests and linting for robust development was emphasized. We should prioritize implementing these practices more comprehensively.

Data Engineering

Data Validation: We were introduced to pyDVL, a tool for identifying data points that negatively impact model performance. This can be used to improve model accuracy and identify mislabeled data.

Data Pipelines: Apache Airflow was presented as a valuable tool for monitoring and connecting data processing pipelines. This could potentially streamline our workflows.

Machine Learning

LLM Integration: Several presentations showcased LLM integration for various tasks. We can leverage these approaches for our Migo Instructor project, potentially building custom logic for better control and error handling. We should compare Haystack and Langchain for LLM integration, considering our specific needs (PDS Ticket).

See also: https://medium.com/henkel-data-and-analytics

Other Interesting Topics:

  • Property-Based Testing: Hypothesis offers an intriguing approach for test case generation in specific scenarios, although it might not be a perfect fit for us currently.
  • Functional Programming Principles: While Python is not purely functional, we can leverage functional programming concepts like iterators, generators, and comprehensions for cleaner and potentially more efficient code. Download Functional Python from Mike Müller.
  • Unexpected Data Problems: A structured approach to tackling unexpected data issues was presented, including data profiling, consulting domain experts, and adjusting the model design when necessary.
  • Causal Machine Learning: This field goes beyond prediction and explores "why" things happen. We should explore Double ML for future regression problems where understanding causality is important

Data Science Tools

  • Streamlit: This framework was highlighted for its ease of use in building data science applications. We can explore Streamlit-feedback for user feedback integration within our dashboards.
  • Polars: For data manipulation tasks currently handled with Pandas, Polars offers a potentially faster alternative, especially for complex operations. We should investigate its limitations and potential benefits for our use cases.

Our guys with the Streamlit team:

Backend Services/Asynchronous Programming:

Asyncio for handling tasks involving waiting (e.g., database interactions) can significantly improve the performance of our backend services.

Action Items

  • Implement unit tests and linting practices more comprehensively.
  • Evaluate tools like pyDVL, Airflow, Haystack, Langchain, Streamlit-feedback, Polars, and Double ML for potential integration into our workflows.
  • Investigate and potentially adopt Asyncio for our backend services.
  • Continuously monitor developments in functional programming and property-based testing for potential future use cases.

By applying these learnings and exploring the mentioned tools, Predict42 can stay at the forefront of data science and machine learning advancements.

Excited to try? Book a demo!

Just complete a simple form and book an appointement online.
Book a demo