OCR-Marketplace-Fraud-Image-Classifier

Overview

The Marketplace Fraud Detection with EasyOCR and RuBERT project aims to detect fraudulent images in online marketplaces by classifying them as either fraudulent or non-fraudulent. The model is trained to identify images that contain deceptive text, such as "деньги за отзывы телеграм @mony_tg," which are commonly associated with scams.

Data

The dataset used for training the Marketplace Fraud Image Classifier consists of two main categories: fraudulent images and non-fraudulent (normal) images.

Fraudulent Images

The fraudulent data examples contain images that feature deceptive text, such as scam messages or suspicious offers. These images are crucial for training the model to accurately identify potential fraud attempts. Some examples of fraudulent images are shown below:

Non-Fraudulent (Normal) Images

The non-fraudulent data examples represent legitimate images that do not contain any suspicious content. These images serve as a contrast to the fraudulent data, helping the model distinguish between genuine and fraudulent listings. Examples of non-fraudulent images are provided below:

By training the model on a balanced dataset consisting of both fraudulent and non-fraudulent images, the Marketplace Fraud Image Classifier can learn to effectively distinguish between legitimate and suspicious content, ultimately enhancing the safety and trust in online marketplaces.

Model Architecture

EasyOCR

EasyOCR (Easy Optical Character Recognition) is a powerful library for extracting text from images. It supports a wide range of languages and is known for its accuracy and ease of use. In this project, EasyOCR is employed to extract text from the input images.

Sage

Sage is a grammar correction library that helps improve the quality of extracted text. It analyzes the text and suggests corrections based on grammatical rules and context. By integrating Sage, the project ensures that the extracted text is grammatically correct and easier to process for further analysis.

RuBERT

RuBERT (Russian BERT) is a pre-trained language model based on BERT (Bidirectional Encoder Representations from Transformers). It is specifically designed for the Russian language and is trained on a large corpus of Russian text. RuBERT is used in this project for understanding the meaning of the extracted text and classifying it as either fraudulent or non-fraudulent.

Features

Text Extraction: EasyOCR is used to extract text from input images.
Grammar Correction: Sage is employed to correct grammatical errors in the extracted text.
Text Understanding: RuBERT is utilized to understand the meaning of the corrected text and classify it as either fraudulent or non-fraudulent.
Fraud Detection: The model identifies fraudulent images based on the extracted and analyzed text.

Results

The performance of the Marketplace Fraud Image Classifier was evaluated using various metrics, with a primary focus on the F1 score, which balances precision and recall.

After thorough training and validation, the final model achieved an impressive F1 score of 0.96. This high score indicates that the model is highly effective in distinguishing between fraudulent and non-fraudulent images, demonstrating its ability to minimize false positives and false negatives. While OCR models provide high accuracy, they can be relatively slow, which may not be suitable for some applications. For a more optimal solution, consider my CLIP based solution

Performance Metrics

F1 Score: 0.96
speed: more then 5 minute/3000 img

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
pictures		pictures
OCR approach.ipynb		OCR approach.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR-Marketplace-Fraud-Image-Classifier

Overview

Data

Fraudulent Images

Non-Fraudulent (Normal) Images

Model Architecture

EasyOCR

Sage

RuBERT

Features

Results

Performance Metrics

About

Releases

Packages

Languages

StrangePineAplle/OCR-Marketplace-Fraud-Image-Classifier

Folders and files

Latest commit

History

Repository files navigation

OCR-Marketplace-Fraud-Image-Classifier

Overview

Data

Fraudulent Images

Non-Fraudulent (Normal) Images

Model Architecture

EasyOCR

Sage

RuBERT

Features

Results

Performance Metrics

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages