Retrieve Data
Similarity Search
For context and intermediate steps, please check the previous tutorial Store & Index Data.
Once all the embeddings have been stored and indexed, we are ready to perform similarity search.
Given a query, we encode it into a vector representation and compare it with the data contained in our vector database
# Perform similarity search
query = embeddings_model.encode(
sentences='Dinosaur toy'
)
start = perf_counter()
results = index.search(target=query, k=5)
end = perf_counter()
print(f'It took {end - start} seconds to retrieve results')
It took 0.07517380100034643 seconds to retrieve results
A KnnResult
object is rapidly obtained, which is an iterator of KnnItem
objects,
each of which has an oid
(object ID) and distance
(euclidean distance) attribute.
# Display results
for result in results:
doc_id = f'Product ID:{result.oid}'
product = f'\nProduct Name: {products[result.oid]}'
distance = f'Distance: {result.distance}'
print(f'{doc_id} {product} {distance}')
Product ID: 1523, Distance: 359.94293212890625
Product Name: Fun Express Large PVC Dynamite Dinosaurs - Toys - 12 Pieces
Product ID: 236, Distance: 426.50408935546875
Product Name: Schleich North America Tyrannosaurus Rex Toy Figure, Red
Product ID: 5494, Distance: 428.3280334472656
Product Name: Jurassic World Attack Pack Callovosaurs
Product ID: 6522, Distance: 430.98291015625
Product Name: Educational Insights Dino Construction Company T-Rex Skid Loader
Product ID: 9061, Distance: 432.1407470703125
Product Name: Knuckle-Headz Single Pack - Fang
Retrieve Original Documents
If we are interested in retrieving the encoded documents (the product descriptions
in our case), we can resort to tape
leveraging the external IDs obtained in
each KnnItem
object. Firstly, we need to define a call-back function like
the following:
from shapelets.native import Record
def get_original_document(result_object: Record) -> None:
"""Show the original document from Tape.
Args:
result_object (Record): Object stored in Tape associated with an ID
"""
raw_data_decoded = result_object.data.tobytes().decode()
print(raw_data_decoded)
Let us check the original document associated with the last similarity search result:
# Read from tape and print result
tape.read(objId=result.oid, call_back=get_original_documents)
Make sure this fits by entering your model number. | ✅【Smooth 3D drawing experienced
the best 3D drawing experience by only using 3Doodler Create Plastics
with 3Doodler Create+ and create 3D Printing pen. | ✅【Safe to use】...
Lastly, we close tape
:
# Close tape
tape.close()