This is a simple web scraper that travel around the WWW, extract all text from html pages and stock them in elasticsearch. The principe is simple: 1- i give it a start url. 2-download this page and extract all text && links. 3-Add links to queue file and start url to crawled file. 4-index data in elasticsearch. 5- start url = first url in queue. To add some performance i used multithreading...
-
Notifications
You must be signed in to change notification settings - Fork 0
YassinTalssis/Web_Scraper
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published