Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cost Reduction for Ask Astro #295

Closed
davidgxue opened this issue Feb 12, 2024 · 2 comments · Fixed by #307
Closed

Cost Reduction for Ask Astro #295

davidgxue opened this issue Feb 12, 2024 · 2 comments · Fixed by #307
Assignees
Milestone

Comments

@davidgxue
Copy link
Contributor

davidgxue commented Feb 12, 2024

Main Issues

  1. Reduce reranker from returning 10 documents to 8 documents
  2. clean up data processing steps during ingestion to enforce smaller document length during request

Other Potential Areas

Migrate to using OpenAI's GPT-3.5 models as they are newer and cheaper by 5X.

OpenAI
The 0125 model is the same as the 16k model in Azure
image

Azure OpenAI
image

Migrate to new OpenAI's embedding model (5X cheaper embedding cost)

From OpenAI: "

Reduced price

text-embedding-3-small is also substantially more efficient than our previous generation text-embedding-ada-002 model.
Pricing for text-embedding-3-small has therefore been reduced by 5X compared to text-embedding-ada-002, from a price per 1k tokens of $0.0001 to $0.00002.
"

Already tracked by #286

@davidgxue davidgxue self-assigned this Feb 13, 2024
@davidgxue davidgxue added this to the 0.3.0 milestone Feb 13, 2024
@davidgxue davidgxue modified the milestones: 0.3.0, 0.4.0 Feb 23, 2024
@davidgxue davidgxue modified the milestones: 0.4.0, 0.3.0 Feb 28, 2024
davidgxue added a commit that referenced this issue Feb 28, 2024
### Description
- [Original issue discussed
here](#286)
- [Indirectly related issue about cost reduction (new model 5x cheaper
but better
accuracy)](#295)


### Technical Changes
- Just `schema.json` -- this changes the vectorizer used during
ingestion AND during the retrieval
- on ingestion a new index is created if it doesn't exist before. During
retrieval this index's vectorizer is used for vectorizing user query

### Tests & Evaluation
- No significant difference. Small quality improvements for some
questions. No quality degradation.
- Model is way cheaper than the original V2 ada model
- See details here
#297 (comment)


### Related Issues
closes #286 
partially completes #295
@davidgxue
Copy link
Contributor Author

Decided to implement bullet point 1 and 2 and the embedding model. After evaluation, the GPT-3.5 costs are very small that wouldn't make an immediate impact

@davidgxue
Copy link
Contributor Author

It maybe important to also explore this: #300

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant