# Uncertainty-aware Vision-based Metric Cross-view Geolocalization

## Installation

We provide a docker file that contains a list of dependencies as well as all commands required to install them. Either build the docker image via

`docker build -t cross-view-loc PATH_TO_DOCKERFILE_PARENT_DIR`

(replace `PATH_TO_DOCKERFILE_PARENT_DIR`) or run the provided commands in your OS.

If you use the provided docker image, first start a command prompt via

`docker run -v /HOST_PATH:/DOCKER_PATH --gpus all -it cross-view-loc bash`

and mount your host directories where the datasets are stored and where the script outputs will be saved (replace `HOST_PATH` and `DOCKER_PATH`).

## Data preparation

Download all datasets that we use in the paper and store them in separate folders as follows:
```
./ford-avdata
./argoverse-v1
./argoverse-v2
./kitti360
./nuscenes
./lyft
./pandaset
```

Use the scripts in `packages/georegdata/scripts/prepare` with the respective dataset folders to convert the data into a format readable by our code and downsize the images.

Download aerial images from the orthophoto providers you want to use and store them in separate folders as follows:

```
./googlemaps
./bingmaps
./dcgisYEAR
./massgisYEAR
./stratmapYEAR
```

## Training

Set the environment variables `AERIAL_DATA` and `GROUND_DATA` to the paths that contain your aerial and ground dataset folders. If you want to use pseudo-labels, set the environment variable `PSEUDOLABELS` to the file `pseudolabels.txt` (anonymous download link: https://osf.io/82u37/?view_only=0e122cfa6a0a4968b13c4444f1b47163).

Hyperparameters of the training are set via environment variables. The following is an example command for a training (replace `TRAINING_LOG_DIR` with the directory where training results will be stored):

`CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true VARIANT=point-pillars LOSS_CORR_TYPE=entropy PP_FC_HEADS=1 PP_FC_FILTERS_QK=8 PP_TYPE=wv PP_HEADS=4 PP_GG_KEYVALUE_SOURCE=decoded PP_MIX=1segformer-sr4-mlp2 LR=3e-4 SCHEDULE=poly-2 WEIGHT_DECAY=1e-4 PP_POSENC_PER_BLOCK=0 PP_MASK_BEFORE=1 BACKBONE=timm.convnext_nano_imagenet1k_224 LAYER_DECAY=1.0 PP_HEIGHT_MIN=-5 PP_HEIGHT_MAX=10 PP_HEIGHT_NUM=16 LOSS_CORR_WEIGHT=1.0 LOSS_CORR_TRANSLATION_STD=0.5 LOSS_CORR_LABELSMOOTH=0.0 AERIAL_FINAL_SHAPE=256 BEV_FINAL_SHAPE=192 AERIAL_STRIDE=1 GROUND_ATTN_STRIDES=4,4,4 AERIAL_ATTN_STRIDES=4,4,1 BEV_FILTERS=32,32,32 GROUND_ATTN_FILTERS_V=32,32,32 METERS_PER_PIXEL=0.5 AERIAL_DECODER_FILTERS=32 GROUND_DECODER_FILTERS=64 PP_GC_TYPE=global-mean GROUND_DECODER_STRIDE=4 VALIDATE=1 TRAIN_MAX_BG_DEGREES=0 TRAIN_MAX_BA_DEGREES=10 VAL_MAX_BA_DEGREES=10 TRAIN_MODEL_ANGLES_RANGE=10 TRAIN_MODEL_ANGLES_NUM=3 VAL_MODEL_ANGLES_RANGE=21 VAL_MODEL_ANGLES_NUM=10 METERS_PER_CHUNK=1 SAMPLES=100000 PP_GG_DEFORM_FACTOR=0.01 DATAPRUNE_DROP_HARD=0.01 CHUNK_ORDER=easiest TRAIN_BATCHSIZE=4 VAL_BATCHSIZE=2 PP_FC_SHORTCUT_LOGITS=1 PP_FC_LEARN_SCALE=1 CROSS_ATTENTION_SHORTCUT=111 FUSE_SPECIALISTS=GS VAL_PERIOD=999999999 python3 scripts/train --log TRAINING_LOG_DIR`

## Evaluation

We provide the position and bearing offsets used for our evaluation in the folder `ford_eval_offsets`. To evaluate on Ford AV dataset, run

`python scripts/eval_ford.py --output OUTPUT_PATH --rotation 30 --translation 50 --offsets .../ford_eval_offsets --train-dir TRAIN_DIR` 

and replace `TRAIN_DIR`  with the logging directory of your training run and `OUTPUT_PATH`  with the path where evaluation results will be stored.

To evaluate with the protocol defined by Shi et al. (CVPR2022), download the offsets from https://github.com/shiyujiao/HighlyAccurate, set the environment variable `SHIETAL` to point to the download folder and run the above command with

`--rotation 20 --translation 28.28 --offsets shietal`

