API Endpoints
ActiCrawl provides a RESTful API for web scraping and data extraction. All API requests should be made to the base URL with your API key for authentication.
Base URL
text
https://api.acticrawl.com/v1
Available Endpoints
1. Scrape Single URL
Extract content from a single web page.
http
POST /scrape
Request Body
json
{
"url": "https://example.com",
"formats": ["markdown", "html", "screenshot"],
"options": {
"onlyMainContent": true,
"includeHtml": false,
"waitFor": 3000,
"timeout": 30000,
"usePremiumProxy": false,
"excludeTags": ["nav", ".ads", "#footer"],
"onlyIncludeTags": [],
"headers": {
"User-Agent": "Mozilla/5.0..."
}
}
}
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
url |
string | Yes | The URL to scrape |
formats |
array | No | Output formats: markdown , html , cleanHtml , screenshot , pdf , json |
options |
object | No | Additional scraping options |
Options Object
Option | Type | Default | Description |
---|---|---|---|
onlyMainContent |
boolean | false | Extract only the main content, removing navigation, ads, etc. |
includeHtml |
boolean | false | Include raw HTML in the response |
waitFor |
integer | 0 | Wait time in milliseconds for JavaScript rendering |
timeout |
integer | 30000 | Maximum time in milliseconds to wait for page load |
usePremiumProxy |
boolean | false | Use premium proxy network |
excludeTags |
array | [] | CSS selectors or tags to exclude |
onlyIncludeTags |
array | [] | Only include these CSS selectors or tags |
headers |
object | {} | Custom HTTP headers to send with the request |
Response
json
{
"success": true,
"data": {
"markdown": "# Page Title\n\nContent here...",
"html": "<html>...</html>",
"cleanHtml": "<div>...</div>",
"screenshot": "base64_encoded_image",
"metadata": {
"title": "Page Title",
"description": "Page description",
"language": "en",
"author": "Author Name",
"publishedTime": "2024-01-01T00:00:00Z",
"modifiedTime": "2024-01-02T00:00:00Z",
"keywords": ["keyword1", "keyword2"],
"ogImage": "https://example.com/image.jpg"
}
},
"credits_used": 1
}
2. Crawl Multiple Pages
Crawl a website following links up to a specified depth.
http
POST /crawl
Request Body
json
{
"url": "https://example.com",
"match": "https://example.com/**",
"selector": "a[href]",
"limit": 10,
"depth": 2,
"formats": ["markdown"],
"options": {
"onlyMainContent": true
}
}
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
url |
string | Yes | Starting URL for crawling |
match |
string | No | URL pattern to match (glob format) |
selector |
string | No | CSS selector for links to follow |
limit |
integer | No | Maximum number of pages to crawl (default: 10) |
depth |
integer | No | Maximum crawl depth (default: 2) |
formats |
array | No | Output formats for each page |
options |
object | No | Scraping options (same as single URL) |
Response
json
{
"success": true,
"data": [
{
"url": "https://example.com",
"markdown": "# Home Page...",
"metadata": {...}
},
{
"url": "https://example.com/about",
"markdown": "# About Us...",
"metadata": {...}
}
],
"credits_used": 2,
"pages_crawled": 2
}
3. Map Website Structure
Generate a sitemap of a website.
http
POST /map
Request Body
json
{
"url": "https://example.com",
"match": "https://example.com/**",
"selector": "a[href]",
"limit": 100,
"depth": 3
}
Response
json
{
"success": true,
"data": {
"urls": [
"https://example.com",
"https://example.com/about",
"https://example.com/contact",
"https://example.com/blog",
"https://example.com/blog/post-1"
],
"structure": {
"https://example.com": {
"title": "Home",
"links": [
"https://example.com/about",
"https://example.com/contact",
"https://example.com/blog"
]
}
}
},
"credits_used": 1,
"pages_analyzed": 5
}
4. Search Website Content
Search for specific content within a website.
http
POST /search
Request Body
json
{
"url": "https://example.com",
"query": "machine learning",
"match": "https://example.com/blog/**",
"limit": 20,
"depth": 2
}
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
url |
string | Yes | Starting URL for search |
query |
string | Yes | Search query |
match |
string | No | URL pattern to search within |
limit |
integer | No | Maximum pages to search (default: 20) |
depth |
integer | No | Maximum search depth (default: 2) |
Response
json
{
"success": true,
"data": {
"results": [
{
"url": "https://example.com/blog/ml-intro",
"title": "Introduction to Machine Learning",
"excerpt": "...machine learning is a subset of artificial intelligence...",
"relevance_score": 0.95
}
],
"total_results": 3
},
"credits_used": 5,
"pages_searched": 5
}
5. Get Account Info
Retrieve your account information and usage statistics.
http
GET /account
Response
json
{
"success": true,
"data": {
"email": "user@example.com",
"plan": "Professional",
"credits": {
"used": 1250,
"limit": 10000,
"reset_date": "2024-02-01T00:00:00Z"
},
"rate_limits": {
"requests_per_minute": 60,
"concurrent_requests": 10
}
}
}
Error Responses
All endpoints return consistent error responses:
json
{
"success": false,
"error": {
"code": "INVALID_URL",
"message": "The provided URL is not valid",
"details": "URL must include protocol (http:// or https://)"
}
}
Common Error Codes
Code | Description |
---|---|
INVALID_API_KEY |
API key is missing or invalid |
RATE_LIMIT_EXCEEDED |
Too many requests |
INVALID_URL |
URL is malformed or not accessible |
TIMEOUT |
Request timed out |
INVALID_PARAMETERS |
Request parameters are invalid |
INSUFFICIENT_CREDITS |
Not enough credits in your account |
INTERNAL_ERROR |
Server error occurred |
Rate Limits
Rate limits vary by plan:
Plan | Requests/Minute | Concurrent Requests |
---|---|---|
Free | 10 | 1 |
Starter | 30 | 5 |
Professional | 60 | 10 |
Business | 120 | 20 |
Enterprise | Unlimited | Unlimited |
Rate limit information is included in response headers:
text
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640995200
Webhooks
For long-running operations, you can provide a webhook URL:
json
{
"url": "https://example.com",
"webhook_url": "https://your-app.com/webhook",
"formats": ["markdown"]
}
The webhook will receive a POST request when the operation completes:
json
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"result": {
"success": true,
"data": {...}
}
}