Docs

Documentation

Learn how to automate your web scraping workflows with ActiCrawl

API Endpoints

ActiCrawl provides a RESTful API for web scraping and data extraction. All API requests should be made to the base URL with your API key for authentication.

Base URL

text
https://www.acticrawl.com/api/v1

Available Endpoints

1. Scrape Single URL

Extract content from a single web page with multiple output format options.

http
POST /scrape

Request Headers

http
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

Request Body

json
{
  "url": "https://example.com",
  "output_format": "markdown",
  "execute_js": true,
  "wait_for": 3000,
  "timeout": 30000,
  "extract_main_content": true,
  "use_premium_proxy": false,
  "exclude_tags": "script,style,nav",
  "include_only_tags": "article,main",
  "scraping_mode": "real"
}

Parameters

Parameter Type Required Description
url string Yes The URL to scrape
output_format string No Output format: html, markdown, html_cleaned, links, json, screenshot (default: html)
execute_js boolean No Enable JavaScript rendering (default: false)
wait_for integer No Wait time in milliseconds before capture (default: 3000)
timeout integer No Request timeout in milliseconds (default: 30000)
extract_main_content boolean No Extract only main content (default: false)
use_premium_proxy boolean No Use premium proxy network (requires plan support)
exclude_tags string No Comma-separated tags to exclude
include_only_tags string No Comma-separated tags to include only
scraping_mode string No Scraping mode: simulation or real (default: simulation)

Output Formats

  • html - Raw HTML content (default)
  • markdown - Converted to Markdown format
  • html_cleaned - Cleaned HTML without scripts/styles
  • links - All links extracted from the page
  • json - Structured JSON with title, content, and links
  • screenshot - Page screenshot (when available)

Response

json
{
  "success": true,
  "data": {
    "url": "https://example.com",
    "content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "format": "markdown",
    "metadata": {
      "title": "Example Domain",
      "processing_time": 1.5,
      "plan": "Free",
      "scraping_mode": "real",
      "scraped_at": "2024-01-15T10:30:00Z",
      "response_time": 1500
    }
  }
}

Response Fields

Field Type Description
success boolean Whether the request was successful
data.url string The scraped URL
data.content string/array/object The scraped content (format depends on output_format)
data.format string The output format used
data.metadata object Additional information about the scraping

Error Response

json
{
  "success": false,
  "error": "Rate limit exceeded",
  "data": {
    "url": "https://example.com",
    "error": "Rate limit exceeded",
    "metadata": {
      "processing_time": 0.1,
      "plan": "Free",
      "scraping_mode": "simulation"
    }
  }
}

Rate Limiting

API rate limits are based on your subscription plan:

  • Free Plan: 100 requests/month, 1 concurrent request
  • Basic Plan: 10,000 requests/month, 5 concurrent requests
  • Pro Plan: 100,000 requests/month, 20 concurrent requests
  • Enterprise Plan: Unlimited requests, 100 concurrent requests

Rate limit information is included in response headers:

http
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1704844800

Examples

Basic HTML Scraping

bash
curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Markdown Extraction

bash
curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "output_format": "markdown",
    "extract_main_content": true
  }'

JavaScript-Rendered Content

bash
curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "output_format": "html_cleaned",
    "execute_js": true,
    "wait_for": 5000
  }'

Link Extraction

bash
curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "output_format": "links"
  }'

Structured JSON Data

bash
curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "output_format": "json",
    "extract_main_content": true
  }'

Best Practices

  1. Use appropriate output formats - Choose the format that best suits your needs to minimize processing time
  2. Set reasonable timeouts - Adjust timeout based on the complexity of the target site
  3. Use JavaScript rendering sparingly - Only enable when necessary as it increases processing time
  4. Implement retry logic - Handle transient failures with exponential backoff
  5. Cache results - Store scraped data to avoid unnecessary repeated requests
  6. Monitor rate limits - Check response headers to track your usage

Error Codes

HTTP Status Error Description
400 Bad Request Invalid request parameters
401 Unauthorized Invalid or missing API key
403 Forbidden Access denied (e.g., premium proxy on free plan)
429 Too Many Requests Rate limit exceeded
422 Unprocessable Entity Scraping failed
500 Internal Server Error Server error, please retry