
Description: StormCrawler is an open source web crawler designed to crawl large websites efficiently by scaling horizontally through Apache Storm. It is fault-tolerant and allows integration with other Storm components like machine learning pipelines.
Type: Open Source Test Automation Framework
Founded: 2011
Primary Use: Mobile app testing automation
Supported Platforms: iOS, Android, Windows

Description: Heritrix is an open-source, extensible, web-scale, archival-quality web crawler project built on the Apache stack. It is designed for archiving periodic captures of content from the web and large intranets.
Type: Cloud-based Test Automation Platform
Founded: 2015
Primary Use: Web, mobile, and API testing
Supported Platforms: Web, iOS, Android, API