ActiCrawl is a powerful web scraping and data extraction platform designed for developers and AI applications. It helps you collect clean, structured data from any website using advanced browser automation and intelligent content processing.
ActiCrawl works with virtually any public website, including modern JavaScript-heavy applications, SPAs (Single Page Applications), and dynamically loaded content. We handle complex scenarios like infinite scrolling, AJAX requests, and client-side rendering.
Developers building AI applications, data scientists collecting training data, businesses monitoring competitors, researchers gathering information, and anyone who needs to extract structured data from websites at scale.
ActiCrawl offers both open-source and commercial versions. The core scraping engine is available on GitHub for self-hosting, while our cloud platform provides additional features like distributed scraping, automatic scaling, and managed infrastructure.
ActiCrawl specializes in clean data extraction optimized for AI applications. We provide multiple output formats (Markdown, JSON, HTML), intelligent content detection, automatic data cleaning, and seamless integration with popular AI frameworks.
Scraping & Crawling
ActiCrawl uses real browser engines (Chromium-based) to fully render JavaScript and wait for dynamic content to load. Our Smart Wait technology automatically detects when pages are ready, ensuring you capture all the data you need.
This can happen due to several reasons: robots.txt restrictions, rate limiting, authentication requirements, or crawl depth limits. Check your crawl settings and ensure you have proper permissions to access all desired pages.
Yes! ActiCrawl can discover pages by following links, analyzing navigation menus, and detecting URL patterns. While sitemaps help with efficiency, they're not required for successful crawling.
ActiCrawl supports multiple output formats: clean Markdown (perfect for LLMs), structured JSON, raw HTML, screenshots, and PDF. You can also define custom extraction rules for specific data structures.
We use advanced algorithms to remove ads, popups, navigation elements, and other noise. Our content extraction focuses on the main article or data, providing clean, readable output suitable for AI training and analysis.
Absolutely! ActiCrawl is built for scale with distributed crawling, automatic retry mechanisms, request queuing, and cloud infrastructure that can handle millions of pages. Our platform scales automatically based on your needs.
Yes, ActiCrawl respects robots.txt by default. This can be configured in your crawl settings if you have explicit permission to bypass these restrictions for legitimate use cases.
ActiCrawl includes intelligent rate limiting, automatic retries with exponential backoff, request distribution across multiple IPs, and smart caching to minimize redundant requests while respecting website resources.
ActiCrawl can handle basic authentication and maintain session cookies. For captchas, we recommend using specialized captcha-solving services or obtaining proper API access from the target website.
API Related
After signing up, you can find your API key in your dashboard under 'API Settings'. Each account has a unique API key that should be included in all API requests for authentication.
Proxy
A proxy list is a collection of proxy servers that can be used to route your requests through. ActiCrawl provides rotating proxies from various regions to help avoid IP blocking and access geo-restricted content.
Proxy lists are automatically provided with Pro plans and above. You can use the 'use_proxy: true' parameter in API requests or configure default proxy settings in your dashboard.
Proxy country setting allows you to route your requests through proxy servers from specific countries. This is useful for accessing region-restricted content or getting localized search results.
Use the 'proxy_country' parameter in your API requests to specify the desired country code (e.g., 'US', 'UK', 'JP'). Available countries list can be found in your dashboard proxy settings.
ActiCrawl offers residential proxies (high success rates), datacenter proxies (fast speeds), and mobile proxies (mobile-specific content). Availability varies by region, and each type has unique benefits and use cases.
Billing
ActiCrawl offers a generous free tier with 500 credits per month, perfect for testing and small projects. Paid plans start at $20/month for additional credits and advanced features.
Monthly credits automatically reset on your subscription renewal date each month. For example, if you subscribed on the 15th, your credits will refresh on the 15th of every month. Unused credits do not roll over to the next month.
Currently, we offer monthly subscription plans which provide better value for regular users. We're considering pay-as-you-go options for the future. Contact sales for custom enterprise arrangements.
Basic page scraping: 1 credit, deep crawling (multiple pages): 1 credit per page, AI-powered data extraction: 2-3 credits, screenshot capture: 1 credit, PDF generation: 2 credits. Exact costs may vary based on request complexity.
No, we don't charge for failed requests. Credits are only deducted for successful data extraction. Failed requests due to server errors or timeouts are automatically retried at no additional cost.
We accept all major credit cards (Visa, MasterCard, American Express), debit cards, and corporate payment methods through our secure payment processor. Enterprise customers can also pay via invoice.
Still have questions?
Can't find the answer you're looking for? Our support team is here to help.