Playing with ColBERTV2 Embeddings and Retrieval

There are a lot of embedding models out there for LLMs. ColbertV2 is a neat one. Here are some thoughts and code examples.

ColbertV2

The way you shove data into any embedding model can make a difference, and ColBERT is no different. I started off just giving it an html file with the entirety of a website (vimbook’s print-site one-pager). This had a bunch of junk that wasn’t needed, which occasionally affected the

sqlite-utils insert-files https://github.com/bclavie/RAGatouille

Multiline script example:

# enable multilib - see link below
paru # make sure things are up to date generally
paru -S android-tools android-sdk-build-tools # includes adb and other goodies
reboot

Image example: Source selection