Quick Start
Get up and running with ActiCrawl in just a few minutes. This guide will walk you through creating your first web scraping request.
Prerequisites
Before you begin, make sure you have:
- An ActiCrawl account (Sign up here)
- An API key (Generate one in your dashboard)
Your First Request
Here's the simplest way to scrape a webpage:
bash
curl -X POST https://api.acticrawl.com/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"format": "markdown"
}'
Replace YOUR_API_KEY
with your actual API key.
Response Format
The API will return a JSON response:
json
{
"success": true,
"data": {
"content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"description": "Example Domain for documentation",
"url": "https://example.com"
}
},
"usage": {
"credits_used": 1,
"credits_remaining": 99
}
}
Common Use Cases
1. Extract Clean Text (Markdown)
Perfect for feeding content to LLMs:
javascript
const response = await fetch('https://api.acticrawl.com/v1/scrape', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: 'https://example.com/article',
format: 'markdown',
clean: true // Removes ads, navigation, footers
})
});
const data = await response.json();
console.log(data.data.content);
2. Extract Structured Data (JSON)
Extract specific data points:
python
import requests
response = requests.post(
'https://api.acticrawl.com/v1/scrape',
headers={
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
json={
'url': 'https://example.com/product',
'format': 'json',
'schema': {
'title': 'h1',
'price': '.price',
'description': '.product-description',
'images': 'img[src]'
}
}
)
data = response.json()
print(data['data']['extracted'])
3. Handle JavaScript-Heavy Sites
For sites that require JavaScript rendering:
ruby
require 'net/http'
require 'json'
uri = URI('https://api.acticrawl.com/v1/scrape')
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri)
request['Authorization'] = 'Bearer YOUR_API_KEY'
request['Content-Type'] = 'application/json'
request.body = {
url: 'https://example.com/spa',
format: 'html',
wait_for: 'networkidle', # Wait for all network requests to complete
timeout: 30000
}.to_json
response = http.request(request)
data = JSON.parse(response.body)
Advanced Options
Wait Strategies
Control when the scraper considers the page ready
:
load
- Wait for the page load event (fastest)domcontentloaded
- Wait for DOM to be fully loadednetworkidle
- Wait for network to be idle (best for SPAs)- Custom CSS selector - Wait for specific element:
wait_for: '#content'
Proxy Usage
Use proxies for geo-restricted content:
json
{
"url": "https://example.com",
"proxy": {
"country": "US",
"type": "residential"
}
}
Screenshots
Capture visual representations:
json
{
"url": "https://example.com",
"format": "screenshot",
"screenshot_options": {
"full_page": true,
"type": "png"
}
}
Rate Limits
- Free tier: 1 request per second
- Basic tier: 5 requests per second
- Pro tier: 20 requests per second
- Expert tier: 50 requests per second
- Enterprise: Unlimited
Error Handling
Always check for errors in your implementation:
javascript
try {
const response = await fetch('https://api.acticrawl.com/v1/scrape', {
// ... request options
});
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const data = await response.json();
if (!data.success) {
console.error('Scraping failed:', data.error);
return;
}
// Process successful response
console.log(data.data.content);
} catch (error) {
console.error('Request failed:', error);
}
Next Steps
- Explore the API Reference for all available options
- Learn about Authentication methods
- Check out more Examples
- Read about Best Practices
Need Help?
- Join our Discord community
- Check the FAQ
- Contact support@acticrawl.com