-
Notifications
You must be signed in to change notification settings - Fork 0
Search Functionality
Tags is use to get a specific resource based on the resource description. The tag is generated based on an open source taxonomy system that classify resource. These tag are generated from LLM (Zero-shot Classification).
Advantages:
- More fine-grained classification of resources
- Can be combined to get a specific resource
- Allow for discovery (sometime people don't know what to search)
Disadvatages:
- Require maintainance
- Could be inaccurate
Note: Three way for maintanance:
- By hand - The client go through the list of resources to add or remove tag
- By ML - The tag are autogenerated as suggestion for human to verify
- By request - The resource owner can request for a tag to be added or removed
Group are category that obtained from the actual file itself. Each resource belong to one category.
Advantage:
- Provide some granularity to resource
- Accurate since obtained from file
Disadvantage:
- Might not be granular enough
- Not all resource have a category
TF-IDF TF (Term frequency) and IDF (inverse document frequency) scores a relation between a term and a document by how frequent and sparse it is (More frequent = more related, rarer word = more related)
Advantage:
- Simple to implement
- Cheap to calculate
Disadvantage:
- No semantic information (Ex: search "help find work" might not return resource with description "help build resume")
Vector Search The semantic of search phrase might not match the word of the description directly. Vector Embedding convert a search phrase into higher dimension vector that encode semantic info to compare with the vector of the description
Advantage:
- Semantic search
- Allow for natural language query (Ex: search "a resource that will helps me to find work")
Disadvantage:
- Require LLM Embedding
- Can be expensive to run
- Could be unreliable (due to LLM)