API Endpoints

ActiCrawl provides a RESTful API for web scraping and data extraction. All API requests should be made to the base URL with your API key for authentication.

Base URL

text

https://api.acticrawl.com/v1

Available Endpoints

1. Scrape Single URL

Extract content from a single web page.

http

POST /scrape

Request Body

            json
            
          

            {
  "url": "https://example.com",
  "formats": ["markdown", "html", "screenshot"],
  "options": {
    "onlyMainContent": true,
    "includeHtml": false,
    "waitFor": 3000,
    "timeout": 30000,
    "usePremiumProxy": false,
    "excludeTags": ["nav", ".ads", "#footer"],
    "onlyIncludeTags": [],
    "headers": {
      "User-Agent": "Mozilla/5.0..."
    }
  }
}

          

Parameters

Parameter	Type	Required	Description
`url`	string	Yes	The URL to scrape
`formats`	array	No	Output formats: `markdown`, `html`, `cleanHtml`, `screenshot`, `pdf`, `json`
`options`	object	No	Additional scraping options

Options Object

Option	Type	Default	Description
`onlyMainContent`	boolean	false	Extract only the main content, removing navigation, ads, etc.
`includeHtml`	boolean	false	Include raw HTML in the response
`waitFor`	integer	0	Wait time in milliseconds for JavaScript rendering
`timeout`	integer	30000	Maximum time in milliseconds to wait for page load
`usePremiumProxy`	boolean	false	Use premium proxy network
`excludeTags`	array	[]	CSS selectors or tags to exclude
`onlyIncludeTags`	array	[]	Only include these CSS selectors or tags
`headers`	object	{}	Custom HTTP headers to send with the request

Response

            json
            
          

            {
  "success": true,
  "data": {
    "markdown": "# Page Title\n\nContent here...",
    "html": "<html>...</html>",
    "cleanHtml": "<div>...</div>",
    "screenshot": "base64_encoded_image",
    "metadata": {
      "title": "Page Title",
      "description": "Page description",
      "language": "en",
      "author": "Author Name",
      "publishedTime": "2024-01-01T00:00:00Z",
      "modifiedTime": "2024-01-02T00:00:00Z",
      "keywords": ["keyword1", "keyword2"],
      "ogImage": "https://example.com/image.jpg"
    }
  },
  "credits_used": 1
}

          

2. Crawl Multiple Pages

Crawl a website following links up to a specified depth.

http

POST /crawl

Request Body

            json
            
          

            {
  "url": "https://example.com",
  "match": "https://example.com/**",
  "selector": "a[href]",
  "limit": 10,
  "depth": 2,
  "formats": ["markdown"],
  "options": {
    "onlyMainContent": true
  }
}

          

Parameters

Parameter	Type	Required	Description
`url`	string	Yes	Starting URL for crawling
`match`	string	No	URL pattern to match (glob format)
`selector`	string	No	CSS selector for links to follow
`limit`	integer	No	Maximum number of pages to crawl (default: 10)
`depth`	integer	No	Maximum crawl depth (default: 2)
`formats`	array	No	Output formats for each page
`options`	object	No	Scraping options (same as single URL)

Response

            json
            
          

            {
  "success": true,
  "data": [
    {
      "url": "https://example.com",
      "markdown": "# Home Page...",
      "metadata": {...}
    },
    {
      "url": "https://example.com/about",
      "markdown": "# About Us...",
      "metadata": {...}
    }
  ],
  "credits_used": 2,
  "pages_crawled": 2
}

          

3. Map Website Structure

Generate a sitemap of a website.

http

POST /map

Request Body

            json
            
          

            {
  "url": "https://example.com",
  "match": "https://example.com/**",
  "selector": "a[href]",
  "limit": 100,
  "depth": 3
}

          

Response

            json
            
          

            {
  "success": true,
  "data": {
    "urls": [
      "https://example.com",
      "https://example.com/about",
      "https://example.com/contact",
      "https://example.com/blog",
      "https://example.com/blog/post-1"
    ],
    "structure": {
      "https://example.com": {
        "title": "Home",
        "links": [
          "https://example.com/about",
          "https://example.com/contact",
          "https://example.com/blog"
        ]
      }
    }
  },
  "credits_used": 1,
  "pages_analyzed": 5
}

          

4. Search Website Content

Search for specific content within a website.

http

POST /search

Request Body

            json
            
          

            {
  "url": "https://example.com",
  "query": "machine learning",
  "match": "https://example.com/blog/**",
  "limit": 20,
  "depth": 2
}

          

Parameters

Parameter	Type	Required	Description
`url`	string	Yes	Starting URL for search
`query`	string	Yes	Search query
`match`	string	No	URL pattern to search within
`limit`	integer	No	Maximum pages to search (default: 20)
`depth`	integer	No	Maximum search depth (default: 2)

Response

            json
            
          

            {
  "success": true,
  "data": {
    "results": [
      {
        "url": "https://example.com/blog/ml-intro",
        "title": "Introduction to Machine Learning",
        "excerpt": "...machine learning is a subset of artificial intelligence...",
        "relevance_score": 0.95
      }
    ],
    "total_results": 3
  },
  "credits_used": 5,
  "pages_searched": 5
}

          

5. Get Account Info

Retrieve your account information and usage statistics.

http

GET /account

Response

            json
            
          

            {
  "success": true,
  "data": {
    "email": "user@example.com",
    "plan": "Professional",
    "credits": {
      "used": 1250,
      "limit": 10000,
      "reset_date": "2024-02-01T00:00:00Z"
    },
    "rate_limits": {
      "requests_per_minute": 60,
      "concurrent_requests": 10
    }
  }
}

          

Error Responses

All endpoints return consistent error responses:

            json
            
          

            {
  "success": false,
  "error": {
    "code": "INVALID_URL",
    "message": "The provided URL is not valid",
    "details": "URL must include protocol (http:// or https://)"
  }
}

          

Common Error Codes

Code	Description
`INVALID_API_KEY`	API key is missing or invalid
`RATE_LIMIT_EXCEEDED`	Too many requests
`INVALID_URL`	URL is malformed or not accessible
`TIMEOUT`	Request timed out
`INVALID_PARAMETERS`	Request parameters are invalid
`INSUFFICIENT_CREDITS`	Not enough credits in your account
`INTERNAL_ERROR`	Server error occurred

Rate Limits

Rate limits vary by plan:

Plan	Requests/Minute	Concurrent Requests
Free	10	1
Starter	30	5
Professional	60	10
Business	120	20
Enterprise	Unlimited	Unlimited

Rate limit information is included in response headers:

text

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640995200

Webhooks

For long-running operations, you can provide a webhook URL:

            json
            
            {
  "url": "https://example.com",
  "webhook_url": "https://your-app.com/webhook",
  "formats": ["markdown"]
}

The webhook will receive a POST request when the operation completes:

            json
            
          

            {
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "result": {
    "success": true,
    "data": {...}
  }
}

          

Documentation

API Endpoints

Base URL

Available Endpoints

1. Scrape Single URL

Request Body

Parameters

Options Object

Response

2. Crawl Multiple Pages

Request Body

Parameters

Response

3. Map Website Structure

Request Body

Response

4. Search Website Content

Request Body

Parameters

Response

5. Get Account Info

Response

Error Responses

Common Error Codes

Rate Limits

Webhooks