Looking for a Heritrix alternative? We've compiled the best options based on user reviews, features, and pricing to help you find the right fit.
What is Heritrix? Heritrix is an open-source, extensible, web-scale, archival-quality web crawler project built on the Apache stack. It is designed for archiving periodic captures of content from the web and large intranets.
WordPress i-Search Pro is a powerful site search engine plugin for WordPress sites. It lets you add advanced, fast, and …
Expertrec Search Engine is an intelligent search engine that understands natural language queries and provides highly relevant results. It uses …
StormCrawler is an open source web crawler designed to crawl large websites efficiently by scaling horizontally through Apache Storm. It …
Google Custom Search Engine is a service that allows you to create a custom search engine for your website or …
ACHE Crawler is an open-source web crawler written in Java. It is designed to efficiently crawl large websites and collect …
Apache Nutch is an open source web crawler software project written in Java. It is used to build web search …
Heritrix is an open-source web crawler software project that was originally developed by the Internet Archive. It is designed to systematically browse and archive web pages by recursively following hyperlinks and storing the content in the WARC file format.Some key features of Heritrix include:Extensible and modular architecture based on Apache standards to support customizationRespects robots.txt and other directives to avoid overloading serversSupports metadata extraction, post-processing of archived data, recovery from errorsDistributed architecture for high performance, scalability and robustnessAdvanced configuration for …
Pricing: Open Source
| Software | Pricing | Score |
|---|---|---|
| Heritrix | Open Source | — |
| Algolia | Freemium | 23 |
| wordpress i-search pro | Paid | 22 |
| Apisearch | Freemium | 20 |
| Expertrec Search Engine | Paid | 20 |
| StormCrawler | Open Source | — |
| Google Custom Search Engine | N/A | — |
| ACHE Crawler | Open Source | — |
| Apache Nutch | Free | — |
| Mixnode | Open Source | — |