Skip to content

YassinTalssis/Web_Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web_Scraper

This is a simple web scraper that travel around the WWW, extract all text from html pages and stock them in elasticsearch. The principe is simple: 1- i give it a start url. 2-download this page and extract all text && links. 3-Add links to queue file and start url to crawled file. 4-index data in elasticsearch. 5- start url = first url in queue. To add some performance i used multithreading...

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages