Jan 31, 2024

Embeddings and Streamlit: School internship at Predict42

AI and data science have emerged as profoundly pertinent and captivating subjects in contemporary times. Hence, I chose to embark on a two-week student internship at Predict 42, a burgeoning startup, in a quest to delve deeper into these domains. This decision reflects my keen interest in understanding the intricacies of artificial intelligence and data science, fueled by the rapidly evolving landscape of technology and its profound impact on various industries.

Embeddings and Streamlit: School internship at Predict42

Student internship at a startup: AI-powered customer feedback analysis

AI and data science are extremely relevant and fascinating topics today. That's why I decided to do my 2-week student internship at the startup Predict 42 to learn more about them.

What experiences did I gain?

Over the past two weeks, I had the opportunity to gain in-depth insights into the company and to learn about the diverse tasks of the employees. During this time, I worked on various projects, including labeling data for the training of an embedding learning algorithm and preparing a presentation on LLMs. Particularly fascinating was the task of developing an app using the Python package Streamlit.

About Streamlit:

Streamlit is a Python package that simplifies the process of creating interactive web applications for data science and machine learning. With its straightforward syntax and minimal code requirements, Streamlit enables users to quickly turn data scripts into shareable web apps. It seamlessly integrates with popular data science libraries like Pandas and Matplotlib, allowing developers to focus on the core functionality of their applications rather than dealing with the intricacies of web development. Streamlit's intuitive design and real-time updates make it an excellent choice for rapidly prototyping and deploying data-driven applications with ease.

What does the app do?

The app consists of two existing and one custom-created app. It is able to display already clustered feedback, find semantically matching feedback for a keyword, and scale images according to the importance of their topic and arrange them in a kind of image cloud.

The clustered feeedback is easily accessible

But how does it work?

The existing feedback is first represented using embeddings. This involves dividing each feedback into individual sentences and then transforming it into a 768-dimensional vector using a machine learning algorithm. The 768-dimensional vectors are then grouped into groups using a clustering algorithm. In this way, all feedback is sorted and made usable for the app. 


I really enjoyed the internship at the startup, since I learned a lot and was able to improve my skills in the areas of AI and programming. In addition, I am extremely grateful for the opportunity to have completed my internship at this exciting startup.

-- Benedict Sittig (middle-one on thumbnail)

Excited to try? Book a demo!

Just complete a simple form and book an appointement online.
Book a demo