Web_Scraper

This is a simple web scraper that travel around the WWW, extract all text from html pages and stock them in elasticsearch. The principe is simple: 1- i give it a start url. 2-download this page and extract all text && links. 3-Add links to queue file and start url to crawled file. 4-index data in elasticsearch. 5- start url = first url in queue. To add some performance i used multithreading...

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Links		Links
README.md		README.md
domain.py		domain.py
elastic.py		elastic.py
general.py		general.py
main.py		main.py
spider.py		spider.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web_Scraper

About

Releases

Packages

Languages

YassinTalssis/Web_Scraper

Folders and files

Latest commit

History

Repository files navigation

Web_Scraper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages