API 엔드포인트

ActiCrawl은 웹 스크래핑과 데이터 추출을 위한 RESTful API를 제공합니다. 모든 API 요청은 인증을 위한 API 키와 함께 기본 URL로 전송되어야 합니다.

기본 URL

text

https://www.acticrawl.com/api/v1

사용 가능한 엔드포인트

1. 단일 URL 스크래핑

단일 웹 페이지에서 다양한 출력 형식 옵션으로 콘텐츠를 추출합니다.

http

POST /scrape

요청 헤더

http

Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

요청 본문

            json
            
          

            {
  "url": "https://example.com",
  "output_format": "markdown",
  "execute_js": true,
  "wait_for": 3000,
  "timeout": 30000,
  "extract_main_content": true,
  "use_premium_proxy": false,
  "exclude_tags": "script,style,nav",
  "include_only_tags": "article,main",
  "scraping_mode": "real"
}

          

매개변수

매개변수	타입	필수	설명
`url`	string	예	스크래핑할 URL
`output_format`	string	아니오	출력 형식: `html`, `markdown`, `html_cleaned`, `links`, `json`, `screenshot` (기본값: `html`)
`execute_js`	boolean	아니오	JavaScript 렌더링 활성화 (기본값: `false`)
`wait_for`	integer	아니오	캡처 전 대기 시간(밀리초) (기본값: `3000`)
`timeout`	integer	아니오	요청 타임아웃(밀리초) (기본값: `30000`)
`extract_main_content`	boolean	아니오	메인 콘텐츠만 추출 (기본값: `false`)
`use_premium_proxy`	boolean	아니오	프리미엄 프록시 네트워크 사용 (플랜 지원 필요)
`exclude_tags`	string	아니오	제외할 태그 (콤마로 구분)
`include_only_tags`	string	아니오	포함할 태그만 (콤마로 구분)
`scraping_mode`	string	아니오	스크래핑 모드: `simulation` 또는 `real` (기본값: `simulation`)

출력 형식

html - 원본 HTML 콘텐츠 (기본값)
markdown - 마크다운 형식으로 변환
html_cleaned - 스크립트/스타일이 제거된 깨끗한 HTML
links - 페이지에서 추출된 모든 링크
json - 제목, 콘텐츠, 링크가 포함된 구조화된 JSON
screenshot - 페이지 스크린샷 (사용 가능한 경우)

응답

            json
            
          

            {
  "success": true,
  "data": {
    "url": "https://example.com",
    "content": "# Example Domain\n\n이 도메인은 예시에서 사용하기 위한 것입니다...",
    "format": "markdown",
    "metadata": {
      "title": "Example Domain",
      "processing_time": 1.5,
      "plan": "Free",
      "scraping_mode": "real",
      "scraped_at": "2024-01-15T10:30:00Z",
      "response_time": 1500
    }
  }
}

          

응답 필드

필드	타입	설명
`success`	boolean	요청 성공 여부
`data.url`	string	스크래핑된 URL
`data.content`	string/array/object	스크래핑된 콘텐츠 (`output_format`에 따라 형식이 다름)
`data.format`	string	사용된 출력 형식
`data.metadata`	object	스크래핑에 대한 추가 정보

오류 응답

            json
            
          

            {
  "success": false,
  "error": "Rate limit exceeded",
  "data": {
    "url": "https://example.com",
    "error": "Rate limit exceeded",
    "metadata": {
      "processing_time": 0.1,
      "plan": "Free",
      "scraping_mode": "simulation"
    }
  }
}

          

사용량 제한

API 사용량 제한은 구독 플랜에 따라 결정됩니다:

무료 플랜: 월 100회 요청, 동시 요청 1개
기본 플랜: 월 10,000회 요청, 동시 요청 5개
프로 플랜: 월 100,000회 요청, 동시 요청 20개
엔터프라이즈 플랜: 무제한 요청, 동시 요청 100개

사용량 제한 정보는 응답 헤더에 포함됩니다:

http

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1704844800

예제

기본 HTML 스크래핑

            bash
            
            curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

마크다운 추출

            bash
            
          

            curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "output_format": "markdown",
    "extract_main_content": true
  }'

          

JavaScript 렌더링된 콘텐츠

            bash
            
          

            curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "output_format": "html_cleaned",
    "execute_js": true,
    "wait_for": 5000
  }'

          

링크 추출

            bash
            
          

            curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "output_format": "links"
  }'

          

구조화된 JSON 데이터

            bash
            
          

            curl -X POST https://www.acticrawl.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "output_format": "json",
    "extract_main_content": true
  }'

          

모범 사례

적절한 출력 형식 사용 - 처리 시간을 최소화하기 위해 필요에 가장 적합한 형식을 선택하세요
합리적인 타임아웃 설정 - 대상 사이트의 복잡성에 따라 타임아웃을 조정하세요
JavaScript 렌더링은 필요한 경우에만 - 처리 시간이 증가하므로 필요한 경우에만 활성화하세요
재시도 로직 구현 - 일시적인 실패를 지수 백오프로 처리하세요
결과 캐싱 - 불필요한 반복 요청을 피하기 위해 스크래핑된 데이터를 저장하세요
사용량 제한 모니터링 - 응답 헤더를 확인하여 사용량을 추적하세요

오류 코드

HTTP 상태	오류	설명
400	Bad Request	잘못된 요청 매개변수
401	Unauthorized	유효하지 않거나 누락된 API 키
403	Forbidden	접근 거부 (예: 무료 플랜에서 프리미엄 프록시)
429	Too Many Requests	사용량 제한 초과
422	Unprocessable Entity	스크래핑 실패
500	Internal Server Error	서버 오류, 재시도하세요

문서

API 엔드포인트

기본 URL

사용 가능한 엔드포인트

1. 단일 URL 스크래핑

요청 헤더

요청 본문

매개변수

출력 형식

응답

응답 필드

오류 응답

사용량 제한

예제

기본 HTML 스크래핑

마크다운 추출

JavaScript 렌더링된 콘텐츠

링크 추출

구조화된 JSON 데이터

모범 사례

오류 코드