API Endpoints

ActiCrawl provides a RESTful API for web scraping and data extraction. All API requests should be made to the base URL with your API key for authentication.

Base URL

text

https://www.acticrawl.com/api/v1

Available Endpoints

1. Scrape Single URL

Extract content from a single web page with multiple output format options.

http

POST /scrape

Request Headers

http

Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

Request Body

            json
            
          

            {
  "url": "https://example.com",
  "output_format": "markdown",
  "execute_js": true,
  "wait_for": 3000,
  "timeout": 30000,
  "extract_main_content": true,
  "use_premium_proxy": false,
  "exclude_tags": "script,style,nav",
  "include_only_tags": "article,main",
  "scraping_mode": "real"
}

          

Parameters

Parameter	Type	Required	Description
`url`	string	Yes	The URL to scrape
`output_format`	string	No	Output format: `html`, `markdown`, `html_cleaned`, `links`, `json`, `screenshot` (default: `html`)
`execute_js`	boolean	No	Enable JavaScript rendering (default: `false`)
`wait_for`	integer	No	Wait time in milliseconds before capture (default: `3000`)
`timeout`	integer	No	Request timeout in milliseconds (default: `30000`)
`extract_main_content`	boolean	No	Extract only main content (default: `false`)
`use_premium_proxy`	boolean	No	Use premium proxy network (requires plan support)
`exclude_tags`	string	No	Comma-separated tags to exclude
`include_only_tags`	string	No	Comma-separated tags to include only
`scraping_mode`	string	No	Scraping mode: `simulation` or `real` (default: `simulation`)

Output Formats

html - Raw HTML content (default)
markdown - Converted to Markdown format
html_cleaned - Cleaned HTML without scripts/styles
links - All links extracted from the page
json - Structured JSON with title, content, and links
screenshot - Page screenshot (when available)

Response

            json
            
          

            {
  "success": true,
  "data": {
    "url": "https://example.com",
    "content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "format": "markdown",
    "metadata": {
      "title": "Example Domain",
      "processing_time": 1.5,
      "plan": "Free",
      "scraping_mode": "real",
      "scraped_at": "2024-01-15T10:30:00Z",
      "response_time": 1500
    }
  }
}

          

Response Fields

Field	Type	Description
`success`	boolean	Whether the request was successful
`data.url`	string	The scraped URL
`data.content`	string/array/object	The scraped content (format depends on `output_format`)
`data.format`	string	The output format used
`data.metadata`	object	Additional information about the scraping

Error Response

            json
            
          

            {
  "success": false,
  "error": "Rate limit exceeded",
  "data": {
    "url": "https://example.com",
    "error": "Rate limit exceeded",
    "metadata": {
      "processing_time": 0.1,
      "plan": "Free",
      "scraping_mode": "simulation"
    }
  }
}

          

Rate Limiting

API rate limits are based on your subscription plan:

Free Plan: 100 requests/month, 1 concurrent request
Basic Plan: 10,000 requests/month, 5 concurrent requests
Pro Plan: 100,000 requests/month, 20 concurrent requests
Enterprise Plan: Unlimited requests, 100 concurrent requests

Rate limit information is included in response headers:

http

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1704844800

Examples

Basic HTML Scraping

            bash
            
            curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Markdown Extraction

            bash
            
          

            curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "output_format": "markdown",
    "extract_main_content": true
  }'

          

JavaScript-Rendered Content

            bash
            
          

            curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "output_format": "html_cleaned",
    "execute_js": true,
    "wait_for": 5000
  }'

          

Link Extraction

            bash
            
          

            curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "output_format": "links"
  }'

          

Structured JSON Data

            bash
            
          

            curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "output_format": "json",
    "extract_main_content": true
  }'

          

Best Practices

Use appropriate output formats - Choose the format that best suits your needs to minimize processing time
Set reasonable timeouts - Adjust timeout based on the complexity of the target site
Use JavaScript rendering sparingly - Only enable when necessary as it increases processing time
Implement retry logic - Handle transient failures with exponential backoff
Cache results - Store scraped data to avoid unnecessary repeated requests
Monitor rate limits - Check response headers to track your usage

Error Codes

HTTP Status	Error	Description
400	Bad Request	Invalid request parameters
401	Unauthorized	Invalid or missing API key
403	Forbidden	Access denied (e.g., premium proxy on free plan)
429	Too Many Requests	Rate limit exceeded
422	Unprocessable Entity	Scraping failed
500	Internal Server Error	Server error, please retry

Documentation

API Endpoints

Base URL

Available Endpoints

1. Scrape Single URL

Request Headers

Request Body

Parameters

Output Formats

Response

Response Fields

Error Response

Rate Limiting

Examples

Basic HTML Scraping

Markdown Extraction

JavaScript-Rendered Content

Link Extraction

Structured JSON Data

Best Practices

Error Codes