Retrieve Data

Similarity Search

info

For context and intermediate steps, please check the previous tutorial Store & Index Data.

Once all the embeddings have been stored and indexed, we are ready to perform similarity search.

Given a query, we encode it into a vector representation and compare it with the data contained in our vector database

# Perform similarity search
query = embeddings_model.encode(
    sentences='Dinosaur toy'
)
start = perf_counter()
results = index.search(target=query, k=5)
end = perf_counter()
print(f'It took {end - start} seconds to retrieve results')

Output

It took 0.07517380100034643 seconds to retrieve results

A KnnResult object is rapidly obtained, which is an iterator of KnnItem objects, each of which has an oid (object ID) and distance (euclidean distance) attribute.

# Display results
for result in results:
    doc_id = f'Product ID:{result.oid}'
    product = f'\nProduct Name: {products[result.oid]}'
    distance = f'Distance: {result.distance}'
    print(f'{doc_id} {product} {distance}')

Output
Product ID: 1523, Distance: 359.94293212890625 
Product Name: Fun Express Large PVC Dynamite Dinosaurs - Toys - 12 Pieces
Product ID: 236, Distance: 426.50408935546875 
Product Name: Schleich North America Tyrannosaurus Rex Toy Figure, Red
Product ID: 5494, Distance: 428.3280334472656 
Product Name: Jurassic World Attack Pack Callovosaurs
Product ID: 6522, Distance: 430.98291015625 
Product Name: Educational Insights Dino Construction Company T-Rex Skid Loader
Product ID: 9061, Distance: 432.1407470703125 
Product Name: Knuckle-Headz Single Pack - Fang

Retrieve Original Documents

If we are interested in retrieving the encoded documents (the product descriptions in our case), we can resort to tape leveraging the external IDs obtained in each KnnItem object. Firstly, we need to define a call-back function like the following:

from shapelets.native import Record


def get_original_document(result_object: Record) -> None:
    """Show the original document from Tape.
    
    Args:
        result_object (Record): Object stored in Tape associated with an ID
    """
    raw_data_decoded = result_object.data.tobytes().decode()
    print(raw_data_decoded)

Let us check the original document associated with the last similarity search result:

# Read from tape and print result
tape.read(objId=result.oid, call_back=get_original_documents)

Output
Make sure this fits by entering your model number. | ✅【Smooth 3D drawing experienced
the best 3D drawing experience by only using 3Doodler Create Plastics 
with 3Doodler Create+ and create 3D Printing pen. | ✅【Safe to use】...

Lastly, we close tape:

# Close tape
tape.close()

Retrieve Data

Similarity Search​

Retrieve Original Documents​

Similarity Search

Retrieve Original Documents