]> Humopery - vecsearch.git/summary
 
descriptionCLI to query and update a search engine based on a BERT transformer model and a vector store
last changeSat, 30 Nov 2024 15:01:35 +0000 (09:01 -0600)
readme
Expand description

A tool to manage a basic vector-based search index with candle and pgvector. It can create the index, add documents and run searches.

Borrowed heavily from:

§Initialize the database

The postgresql database must have the pgvector extension installed. The user must have superuser in the target database.

If the target database doesn’t exist yet, then the user must have createdb permission.

$ ./vecsearch init-database --help

Initialize the database when the database or table doesn't exist already
 
Usage: vecsearch init-database [OPTIONS] --dbpassword <DBPASSWORD>
 
Options:
      --dbname <DBNAME>          [default: vsearch]
      --dbhost <DBHOST>          [default: localhost]
      --dbuser <DBUSER>          [default: vsmigrator]
      --dbpassword <DBPASSWORD>  [env: DBPASSWORD=]
  -h, --help                     Print help

For example:

$ export DBPASSWORD=$(gpg -d pw-vsmigrator.gpg)
$ vecsearch init-database
maybe creating the database
database vsearch exists already
maybe creating database objects

§Initialize the model

Download the model files. This command is optional since the model files can be downloaded lazily by the index and search actions.

$ ./vecsearch init-model

§Add documents

A document is a regular file.

The user for this operation requires only write access to the table (not superuser).

Specifying multiple files is more efficient than indexing one file in each invocation.

$ export DBPASSWORD=$(gpg -d pw-vsmigrator.gpg)
$ vecsearch index --file testdata/0 --file testdata/1
indexing file(s)
Loaded and encoded 58.565µs
Took 15.628167ms
Loaded and encoded 55.513µs
Took 8.018493ms

§Search

Return the top five matches for the given search. Note the search is semantic so the right document is returned for e.g. “meow” or “canine” even without the documents containing those words.

The user for this operation needs only read access (not write or superuser).

$ export DBPASSWORD=$(gpg -d pw-vsmigrator.gpg)
$ vecsearch search --search feline
searching for document matches
Loaded and encoded 49.306µs
Took 14.452557ms
The cat sits outside
The cat plays in the garden
I love pasta
Do you like pizza?
The new movie is so great
shortlog
2024-11-30 Erik Mackdanzrename cvmigrator to vsmigrator main
2024-11-30 Erik Mackdanzmain not PR
2024-11-30 Erik Mackdanzadd init-model command
2024-11-30 Erik Mackdanzquery_opt
2024-11-29 Erik MackdanzDB arg names don't clobber USER
2024-11-29 Erik Mackdanzenv support for most args
2024-11-29 Erik MackdanzCan index multiple files together
2024-11-29 Erik Mackdanzdifferent test data
2024-11-29 Erik Mackdanzremove Cargo.lock
2024-11-29 Erik Mackdanzconsistent project name
2024-11-28 Erik Mackdanzdocs
2024-11-28 Erik MackdanzAdd crate docstring
2024-11-28 Erik Mackdanzdon't need to create extension for searches
2024-11-28 Erik Mackdanzremove --prompt
2024-11-28 Erik Mackdanzimplement search
2024-11-28 Erik Mackdanzdata is keyed by content
...
heads
13 months ago main