# Seeking the Shape of Sound

An implement of the paper (id 2586): **Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association**

## Requirement
This code is implement with Pytorch (tested on 1.4.0). 
See [requirement.txt](requirement.txt).

## Data preparation
Download [VoxCeleb](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/) and [VGGFace](https://drive.google.com/file/d/1qmxGwW5_lNQbTqwW81yPObJ-S-n3rpXp/view) to ./data

## Training
<!-- The training process consists of three steps: -->
<!-- 1. Train the model and update identity weights: -->
1. Download pretrained models for backbones:
```bash
wget https://drive.google.com/file/d/1z3iTyfEvfLeLuMvpgyazj9rxeRpKmyua/view?usp=sharing && wget https://drive.google.com/file/d/1aC_TpAIBKm2vaMWw8DKGhUsqgb8nf4aA/view?usp=sharing
```
2. Train the model and update identity weights:
```bash
python3 train.py config/train_reweight.yaml
```
3. Extract identity weights from saved model file:
```bash
python3 train.py config/extract_id_weight.yaml
```
4. Retrain the final model:
```bash
python3 train.py config/train_main.yaml
```
The final model can be downloaded with
```bash
wget https://drive.google.com/file/d/1ZCPMk_0kKz8YO37ciAVJRDnmnTCqNhoG/view?usp=sharing
```

## Evaluation
1. Modify configures in config/train_main.yaml.
2. Run
```bash
python3 eval.py config/train_main.yaml
```
