Getting Started

FOLLOW GPU_README INSTRUCTIONS INSTEAD OF THIS FOR MAX PERFORMANCE ON ONE OR MULTIPLE GPU

Requirements

Make sure you have Docker and Python3.12 installed on your system, and that it is functioning properly. Before running the script, make sure you have the following dependencies installed (listed in requirements.txt). Use a virtual environment to manage these packages:

pip install -r requirements.txt

Step 1: Start the CLIP Server in a Docker Container

In your terminal, run the following command to start the Docker container that hosts the CLIP server:

docker run -p 51009:51000 -v $HOME/.cache:/home/cas/.cache jinaai/clip-server

This command will pull and run the Jina AI CLIP server container, exposing it on port 51009 for local communication.

add "--gpus all" flag to the command if you have a GPU, and have docker configured for GPU acceleration with nvidia-container-toolkit. (This will result in your datset being labeled much more quickly)
add "--stats" flag to get a stats.csv that includes insightful stats

Step 2: Run the Dataset Labeling Script

In a second terminal window, ensure you are in the root directory of this cloned repository. With a virtual environment activated and all required packages from requirements.txt installed, run the following command to label a dataset:

python main.py --file test.csv --categories "cat,dog,bird,centipede,word,any words,any text string will work,this command will work,another example,you can add more"

add "--batch_size" flag to adjust the number of images that are processed in each batch. Increase if you have a lot of RAM. Default is 100

Input File Options:

CSV File: Each element in the CSV should be on its own line.
ZIP File of Images: A zip file containing images can be provided for labeling.

Note:

Images Work Best: The model performs better with images compared to text datasets.
For text datasets, it is recommended to use more advanced options like the ChatGPT API for higher accuracy.

============================================================

Licensing Information

This project uses the CLIP server provided by Jina AI through a Docker container, which is licensed under the Apache License 2.0.

Portions of the CLIP server model, specifically model.py and simple_tokenizer.py, are licensed under the MIT License via OpenCLIP.

For more details on the license terms, please refer to:

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
GPU_README.md		GPU_README.md
LICENSE		LICENSE
README.md		README.md
label_meister.png		label_meister.png
main.py		main.py
optimized_main.py		optimized_main.py
optimized_main.yml		optimized_main.yml
requirements.txt		requirements.txt
test.csv		test.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started

FOLLOW GPU_README INSTRUCTIONS INSTEAD OF THIS FOR MAX PERFORMANCE ON ONE OR MULTIPLE GPU

Requirements

Step 1: Start the CLIP Server in a Docker Container

In your terminal, run the following command to start the Docker container that hosts the CLIP server:

Step 2: Run the Dataset Labeling Script

In a second terminal window, ensure you are in the root directory of this cloned repository. With a virtual environment activated and all required packages from requirements.txt installed, run the following command to label a dataset:

Input File Options:

Note:

Licensing Information

About

Releases

Packages

Languages

License

hem9984/Dataset-label

Folders and files

Latest commit

History

Repository files navigation

Getting Started

FOLLOW GPU_README INSTRUCTIONS INSTEAD OF THIS FOR MAX PERFORMANCE ON ONE OR MULTIPLE GPU

Requirements

Step 1: Start the CLIP Server in a Docker Container

In your terminal, run the following command to start the Docker container that hosts the CLIP server:

Step 2: Run the Dataset Labeling Script

In a second terminal window, ensure you are in the root directory of this cloned repository. With a virtual environment activated and all required packages from requirements.txt installed, run the following command to label a dataset:

Input File Options:

Note:

Licensing Information

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages