API Endpoints
ActiCrawl provides a RESTful API for web scraping and data extraction. All API requests should be made to the base URL with your API key for authentication.
Base URL
text
https://www.acticrawl.com/api/v1
Available Endpoints
1. Scrape Single URL
Extract content from a single web page with multiple output format options.
http
POST /scrape
Request Headers
http
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
Request Body
json
{
"url": "https://example.com",
"output_format": "markdown",
"execute_js": true,
"wait_for": 3000,
"timeout": 30000,
"extract_main_content": true,
"use_premium_proxy": false,
"exclude_tags": "script,style,nav",
"include_only_tags": "article,main",
"scraping_mode": "real"
}
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
url |
string | Yes | The URL to scrape |
output_format |
string | No | Output format: html , markdown , html_cleaned , links , json , screenshot (default: html ) |
execute_js |
boolean | No | Enable JavaScript rendering (default: false ) |
wait_for |
integer | No | Wait time in milliseconds before capture (default: 3000 ) |
timeout |
integer | No | Request timeout in milliseconds (default: 30000 ) |
extract_main_content |
boolean | No | Extract only main content (default: false ) |
use_premium_proxy |
boolean | No | Use premium proxy network (requires plan support) |
exclude_tags |
string | No | Comma-separated tags to exclude |
include_only_tags |
string | No | Comma-separated tags to include only |
scraping_mode |
string | No | Scraping mode: simulation or real (default: simulation ) |
Output Formats
html
- Raw HTML content (default)markdown
- Converted to Markdown formathtml_cleaned
- Cleaned HTML without scripts/styleslinks
- All links extracted from the pagejson
- Structured JSON with title, content, and linksscreenshot
- Page screenshot (when available)
Response
json
{
"success": true,
"data": {
"url": "https://example.com",
"content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"format": "markdown",
"metadata": {
"title": "Example Domain",
"processing_time": 1.5,
"plan": "Free",
"scraping_mode": "real",
"scraped_at": "2024-01-15T10:30:00Z",
"response_time": 1500
}
}
}
Response Fields
Field | Type | Description |
---|---|---|
success |
boolean | Whether the request was successful |
data.url |
string | The scraped URL |
data.content |
string/array/object | The scraped content (format depends on output_format ) |
data.format |
string | The output format used |
data.metadata |
object | Additional information about the scraping |
Error Response
json
{
"success": false,
"error": "Rate limit exceeded",
"data": {
"url": "https://example.com",
"error": "Rate limit exceeded",
"metadata": {
"processing_time": 0.1,
"plan": "Free",
"scraping_mode": "simulation"
}
}
}
Rate Limiting
API rate limits are based on your subscription plan:
- Free Plan: 100 requests/month, 1 concurrent request
- Basic Plan: 10,000 requests/month, 5 concurrent requests
- Pro Plan: 100,000 requests/month, 20 concurrent requests
- Enterprise Plan: Unlimited requests, 100 concurrent requests
Rate limit information is included in response headers:
http
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1704844800
Examples
Basic HTML Scraping
bash
curl -X POST https://www.acticrawl.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
Markdown Extraction
bash
curl -X POST https://www.acticrawl.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"output_format": "markdown",
"extract_main_content": true
}'
JavaScript-Rendered Content
bash
curl -X POST https://www.acticrawl.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"output_format": "html_cleaned",
"execute_js": true,
"wait_for": 5000
}'
Link Extraction
bash
curl -X POST https://www.acticrawl.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"output_format": "links"
}'
Structured JSON Data
bash
curl -X POST https://www.acticrawl.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"output_format": "json",
"extract_main_content": true
}'
Best Practices
- Use appropriate output formats - Choose the format that best suits your needs to minimize processing time
- Set reasonable timeouts - Adjust timeout based on the complexity of the target site
- Use JavaScript rendering sparingly - Only enable when necessary as it increases processing time
- Implement retry logic - Handle transient failures with exponential backoff
- Cache results - Store scraped data to avoid unnecessary repeated requests
- Monitor rate limits - Check response headers to track your usage
Error Codes
HTTP Status | Error | Description |
---|---|---|
400 | Bad Request | Invalid request parameters |
401 | Unauthorized | Invalid or missing API key |
403 | Forbidden | Access denied (e.g., premium proxy on free plan) |
429 | Too Many Requests | Rate limit exceeded |
422 | Unprocessable Entity | Scraping failed |
500 | Internal Server Error | Server error, please retry |