Best Heritrix Alternatives (21)

Looking for a Heritrix alternative? We've compiled the best options based on user reviews, features, and pricing to help you find the right fit.

What is Heritrix? Heritrix is an open-source, extensible, web-scale, archival-quality web crawler project built on the Apache stack. It is designed for archiving periodic captures of content from the web and large intranets.

Top Alternatives to Heritrix

Algolia

Algolia

Freemium

Algolia is a hosted search API that provides highly performant and relevant search results. It enables developers to quickly add …

Score: 23

WordPress i-Search Pro is a powerful site search engine plugin for WordPress sites. It lets you add advanced, fast, and …

Score: 22
Apisearch

Apisearch

Freemium

Apisearch is an open-source search engine that is focused on speed and relevance. It is designed to provide fast and …

Score: 20

Expertrec Search Engine is an intelligent search engine that understands natural language queries and provides highly relevant results. It uses …

Score: 20
StormCrawler

StormCrawler

Open Source

StormCrawler is an open source web crawler designed to crawl large websites efficiently by scaling horizontally through Apache Storm. It …

Google Custom Search Engine is a service that allows you to create a custom search engine for your website or …

ACHE Crawler

ACHE Crawler

Open Source

ACHE Crawler is an open-source web crawler written in Java. It is designed to efficiently crawl large websites and collect …

Apache Nutch is an open source web crawler software project written in Java. It is used to build web search …

Mixnode

Mixnode

Open Source

Mixnode is a privacy-focused web browser that aims to prevent tracking and protect user data. It blocks ads and trackers …

More Similar Software

Heritrix Overview

Heritrix is an open-source web crawler software project that was originally developed by the Internet Archive. It is designed to systematically browse and archive web pages by recursively following hyperlinks and storing the content in the WARC file format.Some key features of Heritrix include:Extensible and modular architecture based on Apache standards to support customizationRespects robots.txt and other directives to avoid overloading serversSupports metadata extraction, post-processing of archived data, recovery from errorsDistributed architecture for high performance, scalability and robustnessAdvanced configuration for …

Pricing: Open Source

Quick Comparison

SoftwarePricingScore
HeritrixOpen Source
AlgoliaFreemium23
wordpress i-search proPaid22
ApisearchFreemium20
Expertrec Search EnginePaid20
StormCrawlerOpen Source
Google Custom Search EngineN/A
ACHE CrawlerOpen Source
Apache NutchFree
MixnodeOpen Source

Read full Heritrix review → | Browse Development software