Docs

Documentation

Learn how to automate your web scraping workflows with ActiCrawl

Quick Start

Get up and running with ActiCrawl in just a few minutes. This guide will walk you through creating your first web scraping request.

Prerequisites

Before you begin, make sure you have:
- An ActiCrawl account (Sign up here)
- An API key (Generate one in your dashboard)

Your First Request

Here's the simplest way to scrape a webpage:

bash
curl -X POST https://api.acticrawl.com/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "format": "markdown"
  }'

Replace YOUR_API_KEY with your actual API key.

Response Format

The API will return a JSON response:

json
{
  "success": true,
  "data": {
    "content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "metadata": {
      "title": "Example Domain",
      "description": "Example Domain for documentation",
      "url": "https://example.com"
    }
  },
  "usage": {
    "credits_used": 1,
    "credits_remaining": 99
  }
}

Common Use Cases

1. Extract Clean Text (Markdown)

Perfect for feeding content to LLMs:

javascript
const response = await fetch('https://api.acticrawl.com/v1/scrape', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://example.com/article',
    format: 'markdown',
    clean: true  // Removes ads, navigation, footers
  })
});

const data = await response.json();
console.log(data.data.content);

2. Extract Structured Data (JSON)

Extract specific data points:

python
import requests

response = requests.post(
    'https://api.acticrawl.com/v1/scrape',
    headers={
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
    },
    json={
        'url': 'https://example.com/product',
        'format': 'json',
        'schema': {
            'title': 'h1',
            'price': '.price',
            'description': '.product-description',
            'images': 'img[src]'
        }
    }
)

data = response.json()
print(data['data']['extracted'])

3. Handle JavaScript-Heavy Sites

For sites that require JavaScript rendering:

ruby
require 'net/http'
require 'json'

uri = URI('https://api.acticrawl.com/v1/scrape')
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true

request = Net::HTTP::Post.new(uri)
request['Authorization'] = 'Bearer YOUR_API_KEY'
request['Content-Type'] = 'application/json'

request.body = {
  url: 'https://example.com/spa',
  format: 'html',
  wait_for: 'networkidle',  # Wait for all network requests to complete
  timeout: 30000
}.to_json

response = http.request(request)
data = JSON.parse(response.body)

Advanced Options

Wait Strategies

Control when the scraper considers the page ready:

  • load - Wait for the page load event (fastest)
  • domcontentloaded - Wait for DOM to be fully loaded
  • networkidle - Wait for network to be idle (best for SPAs)
  • Custom CSS selector - Wait for specific element: wait_for: '#content'

Proxy Usage

Use proxies for geo-restricted content:

json
{
  "url": "https://example.com",
  "proxy": {
    "country": "US",
    "type": "residential"
  }
}

Screenshots

Capture visual representations:

json
{
  "url": "https://example.com",
  "format": "screenshot",
  "screenshot_options": {
    "full_page": true,
    "type": "png"
  }
}

Rate Limits

  • Free tier: 1 request per second
  • Basic tier: 5 requests per second
  • Pro tier: 20 requests per second
  • Expert tier: 50 requests per second
  • Enterprise: Unlimited

Error Handling

Always check for errors in your implementation:

javascript
try {
  const response = await fetch('https://api.acticrawl.com/v1/scrape', {
    // ... request options
  });

  if (!response.ok) {
    throw new Error(`HTTP error! status: ${response.status}`);
  }

  const data = await response.json();

  if (!data.success) {
    console.error('Scraping failed:', data.error);
    return;
  }

  // Process successful response
  console.log(data.data.content);

} catch (error) {
  console.error('Request failed:', error);
}

Next Steps

Need Help?