Hey everyone! I am super excited to share a quick notebook on using Generative Feedback Loops to chunk code files and better structure how they are indexed in the Weaviate Vector Database! Chunking is one of the key topics in Vector Search. We need to break up long documents into smaller parts that we can encode with a pre-trained embedding model and index in a vector index, such as HNSW-PQ. Most solutions use some form of a rolling token window such as taking every 300 tokens as a chunk, with say 50 tokens overlapping between each window. Unfortunately, this solution doesn't work that well for code particularly. We don't want the chunk to cut off in the middle of a function or class definition. Thus, this tutorial employs Generative Feedback Loops to analyze the best places to chunk the code file and write a natural language description of what the code in the chunk does.
I hope this inspires your interest in Generative Feedback Loops, soon more! Also please let us know if you have any issues with running the code in the notebook, more than happy to help!
Code in Weaviate Recipes - https://github.com/weaviate/recipes
The dataset used is the DSPy repository, which can be found here - https://github.com/stanfordnlp/dspy/t...
Here are some experiments we've done on OPRO prompt optimization to achieve JSON outputs without Structured Decoding - https://github.com/weaviate/structure...
Chapters:
0:00 Introduction
3:07 Where to find the code
5:15 What is Semantic Chunking?
6:22 Code deep dive
Смотрите видео Chunking with Generative Feedback Loops онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Connor Shorten 12 Август 2024, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 1,335 раз и оно понравилось 61 людям.