Search vs. Vector Search vs. AI Vector Search (but why?)
Yes, you’ll not be surprised, there’ AI for everything these days.
Oracle recently released their latest, renamed AI based Database 23AI at Oracle OpenWorld 2024 powered with “AI Vector Search. that made me dig into their model to see how it is different. Vector search has been around for some time, and i have used and loved MongoDB’s Vector Search capabilities. And not just this, the whole GenAI has been about searching and waiting for a specific response than google’s millions of pages responses.
Before we dive into AI Vector search, it’s imperative to understand about regular search and where it all started
Regular Search
Regular search, also known as keyword search, is the most basic form of search functionality. It works by matching keywords or phrases in a search query to exact matches in a database or index. This approach relies on pre-defined rules, such as Boolean logic, to filter and rank results. While effective for simple queries, regular search has limitations:
Exact matches only: Regular search only returns exact matches, which can lead to missed results or irrelevant matches.
No context understanding: Regular search doesn't comprehend the context or intent behind the search query, leading to inaccurate results.
Scalability issues: As data grows, regular search can become slow and inefficient.
Normal SQL Search
In a traditional relational database, SQL (Structured Query Language) is used to search for data. When you perform a normal SQL search, you're essentially asking the database to:
Match exact values: Find rows where a specific column matches a specific value (e.g.,
SELECT * FROM users WHERE name = 'Suren'
).Use indexes: The database uses indexes to quickly locate the matching rows. Indexes are like a map that helps the database find the data.
Return exact matches: The database returns only the rows that exactly match the search criteria.
What is a Vector?
As most of you may have learned some programming language, and learnt about an Array. A vector is a data structure that stores numbers, representing a summary or "fingerprint" of the data it's applied to, also known as an embedding.
Let’s use cars as an example to explain vectors.
Imagine you're creating a system that compares different car models. Instead of describing each car with words like "fast," "fuel-efficient," or "expensive," you can represent each car as a vector, where numbers capture various features like speed, fuel efficiency, and price.
Here’s an example of how two cars might be represented as vectors:
"Sedan A": [200, 30, 25000, 4.5]
200: top speed in mph
30: miles per gallon
25000: price in dollars
4.5: user rating out of 5
"SUV B": [180, 25, 35000, 4.2]
180: top speed in mph
25: miles per gallon
35000: price in dollars
4.2: user rating out of 5
Each vector encodes key features of the car. With these vectors, you can easily compare the two cars, for instance, by calculating how close or far apart their vectors are. This helps the system determine which cars are more similar in terms of their features.
In this way, a vector is a numerical summary of a car's characteristics, allowing the system to quickly make comparisons or decisions.
Vector Search: A Step Forward
Vector search is an improvement over regular search, particularly for complex data like images, audio, or text. It works by converting data into numerical vectors, which represent the features and patterns within the data. Vector search then measures the similarity between these vectors to find matches. This approach offers several advantages:
Similarity-based matching: Vector search returns results based on similarity, rather than exact matches, allowing for more nuanced and relevant results.
Improved scalability: Vector search can handle large datasets more efficiently than regular search.
Better handling of complex data: Vector search excels at searching complex data like images, audio, or text, where traditional keyword search falls short.
Vector search is a different approach, used for searching complex data like images, text, or audio. In vector search:
Represent data as vectors: Each item in the database (e.g., an image) is converted into a numerical vector (a list of numbers) that represents its features (e.g., colors, shapes).
Measure similarity: When you search, the database calculates the similarity between the search query (also represented as a vector) and the vectors in the database.
Return similar matches: The database returns the items with the highest similarity scores, which means they're the most similar to the search query.
Key differences with regular Search:
Exact matches vs similarity: Normal SQL search looks for exact matches, while vector search looks for similar matches.
Data representation: Normal SQL search uses traditional data types (e.g., strings, integers), while vector search uses numerical vectors to represent complex data.
Example
Suppose you have a database of images, and you want to find all images that are similar to a picture of a cat.
Normal SQL search: You'd need to manually describe the cat image using keywords (e.g.,
SELECT * FROM images WHERE description = 'cat'
).Vector search: You'd convert the cat image into a vector, and the database would find all images with similar vectors (e.g., pictures of cats, kittens, or even dogs that resemble cats).
Vector search is particularly useful for applications like image recognition, natural language processing, and recommendation systems, where traditional SQL searches may not be effective.
However, vector search still has limitations:
Requires pre-defined vectors: Vector search relies on pre-generated vectors, which can be time-consuming and computationally expensive to create.
Limited contextual understanding: While vector search understands some context, it's still limited in its ability to comprehend the intent and meaning behind the search query.
AI Vector Search: The Future of Search
AI vector search represents the next generation of search technology. By integrating artificial intelligence (AI) and machine learning (ML) with vector search, AI vector search overcomes the limitations of its predecessors. Here's how:
Learned representations: AI vector search uses neural networks to learn compact and meaningful vector representations of data, eliminating the need for pre-defined vectors.
Contextual understanding: AI vector search models comprehend the context and intent behind the search query, allowing for more accurate and relevant results.
Real-time adaptability: AI vector search models can learn from user feedback and adapt to changing data distributions in real-time, ensuring the search results remain accurate and relevant.
Key differences from other regular search:
Learned representations: AI Vector Search uses neural networks to learn compact and meaningful vector representations of data, such as images, text, or audio. These representations capture subtle patterns and relationships in the data.
Contextual understanding: AI Vector Search models can understand the context in which the search query is being made, allowing for more accurate and relevant results.
Semantic search: AI Vector Search enables semantic search, which means it can understand the intent and meaning behind the search query, rather than just matching keywords.
Improved similarity metrics: AI Vector Search uses advanced similarity metrics, such as cosine similarity or Euclidean distance, to measure the similarity between vectors. These metrics are often more effective than traditional metrics.
Efficient indexing: AI Vector Search uses optimized indexing techniques, such as graph-based indexing or hierarchical indexing, to quickly locate the most relevant vectors.
Real-time learning: AI Vector Search models can learn from user feedback and adapt to changing data distributions in real-time, ensuring the search results remain accurate and relevant.
How AI Vector Search works
Data ingestion: The AI Vector Search system ingests data from various sources, such as databases, files, or APIs.
Vectorization: The system converts the ingested data into vector representations using neural networks or other ML algorithms.
Indexing: The vector representations are indexed using optimized indexing techniques to enable fast similarity searches.
Query processing: When a search query is submitted, the system converts the query into a vector representation and uses the indexed vectors to find the most similar matches.
Ranking: The system ranks the search results based on their similarity scores and returns the top results to the user.
Show me an example
To much of information? Please bear with me. Let’s use the above car example to illustrate the differences between regular search, vector search, and AI vector search.
Suppose we have a database of images, and we want to find all images that contain a car.
Regular Search: We'd use keywords like "car" or "automobile" to search for images. However, this approach would only return images with exact matches in their metadata, missing images that contain cars but don't have the keyword in their description.
Vector Search: We'd convert the images into numerical vectors representing their features, such as colors, shapes, and textures. Vector search would then return images with similar vectors, which might include images of cars, but also potentially images of trucks, buses, or other vehicles with similar features.
AI Vector Search: We'd use a neural network to learn vector representations of the images, taking into account the context and intent behind the search query. AI vector search would return images that not only contain cars but also understand the nuances of the search query, such as images of specific car models, colors, or environments.
The evolution of search from regular to vector to AI-powered has transformed the way we access information. While regular search is effective for simple queries, vector search improves scalability and handling of complex data. AI vector search, however, represents the future of search, offering unparalleled contextual understanding, adaptability, and accuracy.
As data continues to grow in volume and complexity, AI vector search will become increasingly important in various applications, from e-commerce and image search to natural language processing and recommendation systems. By harnessing the power of AI and ML, we can unlock new possibilities in search and revolutionize the way we interact with information.
What's Next?
The future of search is exciting, with ongoing research and innovations in AI vector search. Some potential developments on the horizon include:
Multimodal search: Searching across multiple data types, such as text, images, and audio, to provide a more comprehensive understanding of the search query.
Explainable AI: Developing techniques to provide transparency and interpretability into AI vector search models, enabling users to understand why certain results were returned.
Edge AI: Deploying AI vector search models on edge devices, such as smartphones or smart home devices, to enable faster and more efficient search capabilities.
The journey of search is far from over, and we're excited to see what the future holds. One thing is certain – AI vector search is poised to revolutionize the way we search, and we're just beginning to scratch the surface of its potential.
Further Reading
"Vector Search for Generative AI Apps" by DataStax: https://www.datastax.com/resources/whitepaper/vector-search-for-generative-ai-apps
"Vector Database White Paper" by Restackio: https://www.restack.io/p/vector-database-answer-white-paper-cat-ai
"Unlocking the Power of Vector Search" by Pinecone: While Pinecone has extensive documentation on vector search, a whitepaper titled "Unlocking the Power of Vector Search" doesn't appear to exist. However, their main website and blog offer in-depth information on this topic. You can start exploring here: https://www.pinecone.io/
"Vector Search: A New Paradigm for Search and Recommendation" by Faiss (Facebook AI): Faiss doesn't have a traditional whitepaper on this topic, but their GitHub repository and documentation serve as excellent resources: https://faiss.ai/ and https://github.com/facebookresearch/faiss
"Accelerating AI with Vector Search" by NVIDIA: NVIDIA's focus on vector search is more integrated into their GPU technology and developer tools. While a dedicated whitepaper on this exact title isn't readily available, their developer blog and resources on GPU-accelerated vector search are highly informative: https://developer.nvidia.com/ - Search for "vector search