# ChartCap — Supplementary Code

This package accompanies our ICCV 2025 paper **“ChartCap: Mitigating Hallucination of Dense Chart Captioning.”**

It provides

* inference code for the ChartCap-fine-tuned *Phi3.5-vision-instruct* model with a **zero-shot captioning demo**, and  
* evaluation code for **Visual Consistency Score (VCS)** and **OCRScore**.

---

## Directory Structure
```

.
├── Phi-3.5-vision-instruct-ChartCap/
│   ├── Phi.py
│   ├── example.png                       # Figure 8 of the paper (unseen by either model)
│   └── requirements.txt
└── visual\_consistency\_score/
├── evaluate.py
├── requirements.txt
└── README.md

````

---

## 1.  Inference Demo

```bash
conda create -n phi python=3.10 -y
conda activate phi
pip install -r Phi3.5-vision-instruct-ChartCap/requirements.txt
pip install flash-attn==2.5.8 --no-cache-dir --no-build-isolation

# captions example.png with base and fine-tuned checkpoints
python Phi-3.5-vision-instruct-ChartCap/Phi.py
````

| Checkpoint ID                                  | Console alias                  |
| ---------------------------------------------- | ------------------------------ |
| `microsoft/Phi-3.5-vision-instruct`            | **Phi-3.5-Vision-4B**           |
| `junyoung-00/Phi-3.5-vision-instruct-ChartCap` | **Phi-3.5-Vision-4B\_ChartCap** |

Both models have **never** seen *example.png*; the run is strictly zero-shot.

---

## 2.  Visual Consistency Score & OCRScore

### 2.1. Environment

```bash
conda create -n vcs python=3.10 -y
conda activate vcs
pip install -r visual_consistency_score/requirements.txt
pip install flash-attn==2.5.8 --no-cache-dir --no-build-isolation
pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
pip install paddleocr==2.9.1
```

### 2.2. Input Format

`data/captions.json` must be a list of objects with **both** keys:

```jsonc
[
  {
    "image_filename": "chart_0001.png",
    "generated_caption": "A bar chart showing …"
  },
  ...
]
```

### 2.3. Run

```bash
export ANTHROPIC_API_KEY="your-api-key"

python visual_consistency_score/evaluate.py \
       --input_json  data/captions.json \
       --image_dir   data/images
```

*What the script does*

1. **Caption → Matplotlib code** via **Claude-3.5 Sonnet**
2. **Chart Regeneration** (up to three debug attempts)
3. **VCS** using SIGLIP-2 cosine similarity
4. **OCRScore** using PaddleOCR

### 2.4. Console Output

```
=== Metrics Summary ===
Total items:                 1234
Successful reconstructions:  1234 (100.00%)
Average VCS Score:           0.1234 (±0.0254)
Average OCR F1 Score:        0.1234 (±0.2041)
```

`vcs_ocr_results.json` stores per-image scores and dataset-level aggregates.
