# cPanel cron jobs — KomoonWebCrawler

Production host: **webcrawler.komoon.app**. The Node app and cron `cd` target:

```text
/home/bytescorp/webcrawler.komoon.app/app
```

JSON output is written to the **sibling** folder:

```text
/home/bytescorp/webcrawler.komoon.app/data
```

Cron must use the full Node binary from **nodevenv**:

```text
/home/bytescorp/nodevenv/webcrawler.komoon.app/app/20/bin/node
```

Verify with:

```bash
ls -la /home/bytescorp/nodevenv/webcrawler.komoon.app/app/20/bin/node
```

Virtualenv activate (manual SSH):

```bash
source /home/bytescorp/nodevenv/webcrawler.komoon.app/app/20/bin/activate && cd /home/bytescorp/webcrawler.komoon.app/app
```

---

## KomoonWebCrawler jobs

### 1. Zacatecas state (state-wide providers)

Runs web-only providers with **empty municipality** for the state.

**Schedule:** `*/30 * * * *` (every 30 minutes)

```bash
cd /home/bytescorp/webcrawler.komoon.app/app && NODE_OPTIONS="--disable-wasm-trap-handler" CRAWL_COUNTRY=MX CRAWL_STATE=Zacatecas /home/bytescorp/nodevenv/webcrawler.komoon.app/app/20/bin/node src/runBatch.js
```

---

### 2. JSON cleanup (monthly)

Deletes date-folders under `data/` older than retention (default 30 days). Does **not** remove `sync_manifest.json` or `pending_sync.ndjson`.

**Schedule:** `0 3 3 * *` (03:00 on the 3rd of each month)

```bash
cd /home/bytescorp/webcrawler.komoon.app/app && /home/bytescorp/nodevenv/webcrawler.komoon.app/app/20/bin/node src/cleanOldNewsJson.js
```

---

### 3. Jerez municipality

Only providers with `municipality` matching `Jerez`.

**Schedule (suggested):** `*/30 * * * *` or `0,30 * * * *`

```bash
cd /home/bytescorp/webcrawler.komoon.app/app && NODE_OPTIONS="--disable-wasm-trap-handler" CRAWL_COUNTRY=MX CRAWL_STATE=Zacatecas CRAWL_MUNICIPALITY=Jerez /home/bytescorp/nodevenv/webcrawler.komoon.app/app/20/bin/node src/runBatch.js
```

---

### 4. Zacatecas municipality (city)

Only providers with `municipality` set to `Zacatecas`.

**Schedule (suggested):** `*/30 * * * *` or `0,30 * * * *`

```bash
cd /home/bytescorp/webcrawler.komoon.app/app && NODE_OPTIONS="--disable-wasm-trap-handler" CRAWL_COUNTRY=MX CRAWL_STATE=Zacatecas CRAWL_MUNICIPALITY=Zacatecas /home/bytescorp/nodevenv/webcrawler.komoon.app/app/20/bin/node src/runBatch.js
```

---

## Manual test (SSH)

```bash
source /home/bytescorp/nodevenv/webcrawler.komoon.app/app/20/bin/activate && cd /home/bytescorp/webcrawler.komoon.app/app
NODE_OPTIONS="--disable-wasm-trap-handler" CRAWL_COUNTRY=MX CRAWL_STATE=Zacatecas node src/runBatch.js
```

## Deploy updated code

Upload changed files from local `KomoonWebCrawler` to `/home/bytescorp/webcrawler.komoon.app/app` (FTP/Git). **Restart** the Node app in cPanel after deploy.

## Backend fallback sync

Ensure Firebase function `newsArticlesSyncFromJson` is configured with `WEB_CRAWLER_DATA_BASE_URL=https://webcrawler.komoon.app/data` (see KomoonFBFunctions `news_sync.ts`).
