Welcome to the Globant Enterprise AI Ingest utilities package, codename saia-ingest
.
It's purpose is to provide sample code to connect to different data sources and help external developers to interact with the platform to upload documents. Check the configuration section to know the available data-sources.
Check the API section if you want to get sample code to explore the Enterprise AI API.
You can use this repository as reference to extend it for other data sources; please let us know if you create new connectors and contribute back!
- Clone this repository.
- Install it locally.
- Define your data source to use and configure a yaml file.
- Execute it using the samples provided.
To install the package and its dependencies, follow these steps:
- Create a new virtual environment and activate it, for this case we will create one called
venv
:
# Linux
sudo apt install python3-venv
python3 -m venv venv
source venv/bin/activate
# MAC
virtualenv venv
source venv/bin/activate
# Windows
python -m venv venv
.\venv\Scripts\Activate
- Install the package dependencies:
# install poetry first
pip install poetry
# every time you update
poetry install
- Set the
PYTHONPATH
environment variable to the path of the current directory:
export PYTHONPATH="$PYTHONPATH:$(pwd)"
Now the package is locally installed, continue defining a configuration file.
Depending on the command used you may need to set some environment variables such as:
export OPENAI_API_KEY=<your API Key>
# set it to be used always
echo "export OPENAI_API_KEY=X" >> ~/.bashrc
Make sure to set the correct yaml
configuration file under the config
folder. If it does not exists, create a config
folder under the repository. All command will use configuration files from that folder by default.
Run the associated operation using the saia-cli
entry point, supported the ingest
verb only.
saia-cli ingest -c ./config/s3_sandbox.yaml
# using a timestamp
saia-cli ingest -c ./config/s3_sandbox.yaml -t 2023-12-21
# using a type
saia-cli ingest -c ./config/s3_sandbox.yaml --type test
The configuration file details all parameters needed to run the ingestion, use the --type
to decide the target ingestion; supported data sources are:
fs
(file system) configs3
configjira
configconfluence
configgithub
configgdrive
Google Drive configsharepoint
config
Check the debug
folder, where every execution is logged.
ToDo add tests, so far a simple one just to check the mechanism is working, make sure to create a configuration file.
pytest tests/test_api.py
pytest tests/test_proxy.py
check here.
check here.