GitHub - genexuslabs/saia-ingest: Globant Enterprise AI Ingest utilities

Welcome to the Globant Enterprise AI Ingest utilities package, codename saia-ingest.

It's purpose is to provide sample code to connect to different data sources and help external developers to interact with the platform to upload documents. Check the configuration section to know the available data-sources.

Check the API section if you want to get sample code to explore the Enterprise AI API.

You can use this repository as reference to extend it for other data sources; please let us know if you create new connectors and contribute back!

Getting started

Clone this repository.
Install it locally.
Define your data source to use and configure a yaml file.
Execute it using the samples provided.

Installation

To install the package and its dependencies, follow these steps:

Create a new virtual environment and activate it, for this case we will create one called venv:

# Linux
sudo apt install python3-venv
python3 -m venv venv
source venv/bin/activate
# MAC
virtualenv venv
source venv/bin/activate
# Windows
python -m venv venv
.\venv\Scripts\Activate

Install the package dependencies:

# install poetry first
pip install poetry
# every time you update
poetry install

Set the PYTHONPATH environment variable to the path of the current directory:

export PYTHONPATH="$PYTHONPATH:$(pwd)"

Now the package is locally installed, continue defining a configuration file.

Configuration

Variables

Depending on the command used you may need to set some environment variables such as:

export OPENAI_API_KEY=<your API Key>
# set it to be used always
echo "export OPENAI_API_KEY=X" >> ~/.bashrc

YAML

Make sure to set the correct yaml configuration file under the config folder. If it does not exists, create a config folder under the repository. All command will use configuration files from that folder by default.

Run the associated operation using the saia-cli entry point, supported the ingest verb only.

saia-cli ingest -c ./config/s3_sandbox.yaml
# using a timestamp
saia-cli ingest -c ./config/s3_sandbox.yaml -t 2023-12-21
# using a type
saia-cli ingest -c ./config/s3_sandbox.yaml --type test

Data Sources

The configuration file details all parameters needed to run the ingestion, use the --type to decide the target ingestion; supported data sources are:

fs (file system) config
s3 config
jira config
confluence config
github config
gdrive Google Drive config
sharepoint config

Logging

Check the debug folder, where every execution is logged.

Run Tests

ToDo add tests, so far a simple one just to check the mechanism is working, make sure to create a configuration file.

pytest tests/test_api.py
pytest tests/test_proxy.py

Contribution

check here.

License

check here.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.github		.github
.vscode		.vscode
amazon_s3		amazon_s3
atlassian_confluence		atlassian_confluence
atlassian_jira		atlassian_jira
docs		docs
fs		fs
gdrive		gdrive
saia_ingest		saia_ingest
sharepoint		sharepoint
tests		tests
.gitignore		.gitignore
API.md		API.md
CONTRIBUTION.md		CONTRIBUTION.md
EnterpriseAISuite.md		EnterpriseAISuite.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting started

Installation

Configuration

Variables

YAML

Data Sources

Logging

Run Tests

Contribution

License

About

Releases

Packages

Contributors 4

Languages

License

genexuslabs/saia-ingest

Folders and files

Latest commit

History

Repository files navigation

Getting started

Installation

Configuration

Variables

YAML

Data Sources

Logging

Run Tests

Contribution

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages