Frequently Asked Questions

Everything you need to know about ActiCrawl

General

ActiCrawl is a powerful web scraping and data extraction platform designed for developers and AI applications. It helps you collect clean, structured data from any website using advanced browser automation and intelligent content processing.

ActiCrawl works with virtually any public website, including modern JavaScript-heavy applications, SPAs (Single Page Applications), and dynamically loaded content. We handle complex scenarios like infinite scrolling, AJAX requests, and client-side rendering.

Developers building AI applications, data scientists collecting training data, businesses monitoring competitors, researchers gathering information, and anyone who needs to extract structured data from websites at scale.

ActiCrawl offers both open-source and commercial versions. The core scraping engine is available on GitHub for self-hosting, while our cloud platform provides additional features like distributed scraping, automatic scaling, and managed infrastructure.

ActiCrawl specializes in clean data extraction optimized for AI applications. We provide multiple output formats (Markdown, JSON, HTML), intelligent content detection, automatic data cleaning, and seamless integration with popular AI frameworks.

Scraping & Crawling

ActiCrawl uses real browser engines (Chromium-based) to fully render JavaScript and wait for dynamic content to load. Our Smart Wait technology automatically detects when pages are ready, ensuring you capture all the data you need.

This can happen due to several reasons: robots.txt restrictions, rate limiting, authentication requirements, or crawl depth limits. Check your crawl settings and ensure you have proper permissions to access all desired pages.

Yes! ActiCrawl can discover pages by following links, analyzing navigation menus, and detecting URL patterns. While sitemaps help with efficiency, they're not required for successful crawling.

ActiCrawl supports multiple output formats: clean Markdown (perfect for LLMs), structured JSON, raw HTML, screenshots, and PDF. You can also define custom extraction rules for specific data structures.

We use advanced algorithms to remove ads, popups, navigation elements, and other noise. Our content extraction focuses on the main article or data, providing clean, readable output suitable for AI training and analysis.

Absolutely! ActiCrawl is built for scale with distributed crawling, automatic retry mechanisms, request queuing, and cloud infrastructure that can handle millions of pages. Our platform scales automatically based on your needs.

Yes, ActiCrawl respects robots.txt by default. This can be configured in your crawl settings if you have explicit permission to bypass these restrictions for legitimate use cases.

ActiCrawl includes intelligent rate limiting, automatic retries with exponential backoff, request distribution across multiple IPs, and smart caching to minimize redundant requests while respecting website resources.

ActiCrawl can handle basic authentication and maintain session cookies. For captchas, we recommend using specialized captcha-solving services or obtaining proper API access from the target website.

API Related

After signing up, you can find your API key in your dashboard under 'API Settings'. Each account has a unique API key that should be included in all API requests for authentication.

Proxy

A proxy list is a collection of proxy servers that can be used to route your requests through. ActiCrawl provides rotating proxies from various regions to help avoid IP blocking and access geo-restricted content.

Proxy lists are automatically provided with Pro plans and above. You can use the 'use_proxy: true' parameter in API requests or configure default proxy settings in your dashboard.

Proxy country setting allows you to route your requests through proxy servers from specific countries. This is useful for accessing region-restricted content or getting localized search results.

Use the 'proxy_country' parameter in your API requests to specify the desired country code (e.g., 'US', 'UK', 'JP'). Available countries list can be found in your dashboard proxy settings.

ActiCrawl offers residential proxies (high success rates), datacenter proxies (fast speeds), and mobile proxies (mobile-specific content). Availability varies by region, and each type has unique benefits and use cases.

Billing

ActiCrawl offers a generous free tier with 500 credits per month, perfect for testing and small projects. Paid plans start at $20/month for additional credits and advanced features.

Monthly credits automatically reset on your subscription renewal date each month. For example, if you subscribed on the 15th, your credits will refresh on the 15th of every month. Unused credits do not roll over to the next month.

Currently, we offer monthly subscription plans which provide better value for regular users. We're considering pay-as-you-go options for the future. Contact sales for custom enterprise arrangements.

Basic page scraping: 1 credit, deep crawling (multiple pages): 1 credit per page, AI-powered data extraction: 2-3 credits, screenshot capture: 1 credit, PDF generation: 2 credits. Exact costs may vary based on request complexity.

No, we don't charge for failed requests. Credits are only deducted for successful data extraction. Failed requests due to server errors or timeouts are automatically retried at no additional cost.

We accept all major credit cards (Visa, MasterCard, American Express), debit cards, and corporate payment methods through our secure payment processor. Enterprise customers can also pay via invoice.

Still have questions?

Can't find the answer you're looking for? Our support team is here to help.

Contact Support

Frequently Asked Questions

General

What is ActiCrawl?

What sites work?

Who can benefit from using ActiCrawl?

Is ActiCrawl open-source?

What is the difference between ActiCrawl and other web scrapers?

Scraping & Crawling

How does ActiCrawl handle dynamic content on websites?

Why is it not crawling all the pages?

Can ActiCrawl crawl websites without a sitemap?

What formats can ActiCrawl convert web data into?

How does ActiCrawl ensure the cleanliness of the data?

Is ActiCrawl suitable for large-scale data scraping projects?

Does it respect robots.txt?

What measures does ActiCrawl take to handle web scraping challenges like rate limits and caching?

Does ActiCrawl handle captcha or authentication?

API Related

Where can I find my API key?

Proxy

What is a proxy list?

How do I get a proxy list?

What is proxy country setting?

How do I use proxy to change country?

What are the proxy types by location?

Billing

Is ActiCrawl free?

When do monthly credits reset?

Is there a pay per use plan instead of monthly?

What are the costs per request type?

Do you charge for failed requests (scrape, crawl, extract)?

What payment methods do you accept?

Still have questions?