Docs

Documentation

Learn how to automate your web scraping workflows with ActiCrawl

API Endpoints

ActiCrawl provides a RESTful API for web scraping and data extraction. All API requests should be made to the base URL with your API key for authentication.

Base URL

text
https://api.acticrawl.com/v1

Available Endpoints

1. Scrape Single URL

Extract content from a single web page.

http
POST /scrape

Request Body

json
{
  "url": "https://example.com",
  "formats": ["markdown", "html", "screenshot"],
  "options": {
    "onlyMainContent": true,
    "includeHtml": false,
    "waitFor": 3000,
    "timeout": 30000,
    "usePremiumProxy": false,
    "excludeTags": ["nav", ".ads", "#footer"],
    "onlyIncludeTags": [],
    "headers": {
      "User-Agent": "Mozilla/5.0..."
    }
  }
}

Parameters

Parameter Type Required Description
url string Yes The URL to scrape
formats array No Output formats: markdown, html, cleanHtml, screenshot, pdf, json
options object No Additional scraping options

Options Object

Option Type Default Description
onlyMainContent boolean false Extract only the main content, removing navigation, ads, etc.
includeHtml boolean false Include raw HTML in the response
waitFor integer 0 Wait time in milliseconds for JavaScript rendering
timeout integer 30000 Maximum time in milliseconds to wait for page load
usePremiumProxy boolean false Use premium proxy network
excludeTags array [] CSS selectors or tags to exclude
onlyIncludeTags array [] Only include these CSS selectors or tags
headers object {} Custom HTTP headers to send with the request

Response

json
{
  "success": true,
  "data": {
    "markdown": "# Page Title\n\nContent here...",
    "html": "<html>...</html>",
    "cleanHtml": "<div>...</div>",
    "screenshot": "base64_encoded_image",
    "metadata": {
      "title": "Page Title",
      "description": "Page description",
      "language": "en",
      "author": "Author Name",
      "publishedTime": "2024-01-01T00:00:00Z",
      "modifiedTime": "2024-01-02T00:00:00Z",
      "keywords": ["keyword1", "keyword2"],
      "ogImage": "https://example.com/image.jpg"
    }
  },
  "credits_used": 1
}

2. Crawl Multiple Pages

Crawl a website following links up to a specified depth.

http
POST /crawl

Request Body

json
{
  "url": "https://example.com",
  "match": "https://example.com/**",
  "selector": "a[href]",
  "limit": 10,
  "depth": 2,
  "formats": ["markdown"],
  "options": {
    "onlyMainContent": true
  }
}

Parameters

Parameter Type Required Description
url string Yes Starting URL for crawling
match string No URL pattern to match (glob format)
selector string No CSS selector for links to follow
limit integer No Maximum number of pages to crawl (default: 10)
depth integer No Maximum crawl depth (default: 2)
formats array No Output formats for each page
options object No Scraping options (same as single URL)

Response

json
{
  "success": true,
  "data": [
    {
      "url": "https://example.com",
      "markdown": "# Home Page...",
      "metadata": {...}
    },
    {
      "url": "https://example.com/about",
      "markdown": "# About Us...",
      "metadata": {...}
    }
  ],
  "credits_used": 2,
  "pages_crawled": 2
}

3. Map Website Structure

Generate a sitemap of a website.

http
POST /map

Request Body

json
{
  "url": "https://example.com",
  "match": "https://example.com/**",
  "selector": "a[href]",
  "limit": 100,
  "depth": 3
}

Response

json
{
  "success": true,
  "data": {
    "urls": [
      "https://example.com",
      "https://example.com/about",
      "https://example.com/contact",
      "https://example.com/blog",
      "https://example.com/blog/post-1"
    ],
    "structure": {
      "https://example.com": {
        "title": "Home",
        "links": [
          "https://example.com/about",
          "https://example.com/contact",
          "https://example.com/blog"
        ]
      }
    }
  },
  "credits_used": 1,
  "pages_analyzed": 5
}

4. Search Website Content

Search for specific content within a website.

http
POST /search

Request Body

json
{
  "url": "https://example.com",
  "query": "machine learning",
  "match": "https://example.com/blog/**",
  "limit": 20,
  "depth": 2
}

Parameters

Parameter Type Required Description
url string Yes Starting URL for search
query string Yes Search query
match string No URL pattern to search within
limit integer No Maximum pages to search (default: 20)
depth integer No Maximum search depth (default: 2)

Response

json
{
  "success": true,
  "data": {
    "results": [
      {
        "url": "https://example.com/blog/ml-intro",
        "title": "Introduction to Machine Learning",
        "excerpt": "...machine learning is a subset of artificial intelligence...",
        "relevance_score": 0.95
      }
    ],
    "total_results": 3
  },
  "credits_used": 5,
  "pages_searched": 5
}

5. Get Account Info

Retrieve your account information and usage statistics.

http
GET /account

Response

json
{
  "success": true,
  "data": {
    "email": "user@example.com",
    "plan": "Professional",
    "credits": {
      "used": 1250,
      "limit": 10000,
      "reset_date": "2024-02-01T00:00:00Z"
    },
    "rate_limits": {
      "requests_per_minute": 60,
      "concurrent_requests": 10
    }
  }
}

Error Responses

All endpoints return consistent error responses:

json
{
  "success": false,
  "error": {
    "code": "INVALID_URL",
    "message": "The provided URL is not valid",
    "details": "URL must include protocol (http:// or https://)"
  }
}

Common Error Codes

Code Description
INVALID_API_KEY API key is missing or invalid
RATE_LIMIT_EXCEEDED Too many requests
INVALID_URL URL is malformed or not accessible
TIMEOUT Request timed out
INVALID_PARAMETERS Request parameters are invalid
INSUFFICIENT_CREDITS Not enough credits in your account
INTERNAL_ERROR Server error occurred

Rate Limits

Rate limits vary by plan:

Plan Requests/Minute Concurrent Requests
Free 10 1
Starter 30 5
Professional 60 10
Business 120 20
Enterprise Unlimited Unlimited

Rate limit information is included in response headers:

text
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640995200

Webhooks

For long-running operations, you can provide a webhook URL:

json
{
  "url": "https://example.com",
  "webhook_url": "https://your-app.com/webhook",
  "formats": ["markdown"]
}

The webhook will receive a POST request when the operation completes:

json
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "result": {
    "success": true,
    "data": {...}
  }
}