Results: 2D Nuclei Segmentation QPBO Benchmark

Faster Multi-Object Segmentation using Parallel Quadratic Pseudo-Boolean Optimization, ICCV 2021 Paper

Author: Niels Jeppesen (niejep@dtu.dk)

This notebook is used to analyze the benchmark results from the ParallelNucleiSegmentationPart2.ipynb notebook. The benchmark is testing the performance of three different QPBO implementations: K-QPBO, M-QPBO and P-QPBO. The K-QPBO imlementation found in the thinqpbo package, which is almost identical to the original implementation by Vladimir Kolmogorov. P-QPBP is our new parallel QPBO implementation and M-QPBO is our serial QPBO implementation.

In [1]:
import os
from glob import glob

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

Load and display results

First we load the benchmark results from the CSV files. Once we've loaded the results we display the dataframe. Change the variables in the cell below to save figures or change directories.

In [2]:
save_figures = False
figure_dir = 'figures'
benchmark_dir = '../benchmark/nuclei_benchmarks/qpbo/'
benchmark_paths = glob(os.path.join(benchmark_dir, '*nuclei*.csv'))
benchmark_paths
Out[2]:
['../benchmark\\parallel_qpbo_nuclei_benchmark_results_20201011-152342.csv',
 '../benchmark\\parallel_qpbo_nuclei_benchmark_results_20201011-172857.csv']
In [3]:
df_all = pd.read_csv(benchmark_paths[0], index_col=0)
for p in benchmark_paths[1:]:
    df_all = df_all.append(pd.read_csv(p, index_col=0), ignore_index=True)
    
df_all
Out[3]:
Name NucleiCount Class NodeCount EdgeCount BuildTime SolveTime WeakPersistenciesTime TwiceEnergy Timestamp CpuCount ShortName SystemName SystemCpu SystemCpuCount
0 00071198d059ba7f5914a526d124d28e6d010c92466da2... 27 ParallelQpboCapInt32ArcIdxUInt32NodeIdxUInt32 437400 1967306 0.149466 0.411771 0.001930 2160848 2020-10-11 10:08:31.056918 1 Para (1) A59 Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz 32
1 00071198d059ba7f5914a526d124d28e6d010c92466da2... 27 ParallelQpboCapInt32ArcIdxUInt32NodeIdxUInt32 437400 1967306 0.140851 0.406718 0.002056 2160848 2020-10-11 10:08:31.610540 1 Para (1) A59 Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz 32
2 00071198d059ba7f5914a526d124d28e6d010c92466da2... 27 ParallelQpboCapInt32ArcIdxUInt32NodeIdxUInt32 437400 1967306 0.139885 0.404456 0.001970 2160848 2020-10-11 10:08:32.159786 1 Para (1) A59 Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz 32
3 00071198d059ba7f5914a526d124d28e6d010c92466da2... 27 ParallelQpboCapInt32ArcIdxUInt32NodeIdxUInt32 437400 1967306 0.144639 0.396539 0.001968 2160848 2020-10-11 10:08:32.706093 1 Para (1) A59 Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz 32
4 00071198d059ba7f5914a526d124d28e6d010c92466da2... 27 ParallelQpboCapInt32ArcIdxUInt32NodeIdxUInt32 437400 1967306 0.190569 0.402814 0.002028 2160848 2020-10-11 10:08:33.307226 1 Para (1) A59 Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz 32
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
56945 ff599c7301daa1f783924ac8cbe3ce7b42878f15a39c2d... 24 QPBOInt 388800 1230964 0.238975 0.249744 0.003852 2711522 2020-10-11 17:28:55.790497 -1 QPBO n-62-11-55 Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz 32
56946 ff599c7301daa1f783924ac8cbe3ce7b42878f15a39c2d... 24 QPBOInt 388800 1230964 0.227038 0.249213 0.003882 2711522 2020-10-11 17:28:56.273778 -1 QPBO n-62-11-55 Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz 32
56947 ff599c7301daa1f783924ac8cbe3ce7b42878f15a39c2d... 24 QPBOInt 388800 1230964 0.227834 0.249074 0.003837 2711522 2020-10-11 17:28:56.757492 -1 QPBO n-62-11-55 Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz 32
56948 ff599c7301daa1f783924ac8cbe3ce7b42878f15a39c2d... 24 QPBOInt 388800 1230964 0.226489 0.249139 0.003852 2711522 2020-10-11 17:28:57.240062 -1 QPBO n-62-11-55 Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz 32
56949 ff599c7301daa1f783924ac8cbe3ce7b42878f15a39c2d... 24 QPBOInt 388800 1230964 0.226337 0.249996 0.003848 2711522 2020-10-11 17:28:57.723178 -1 QPBO n-62-11-55 Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz 32

56950 rows × 15 columns

Show configurations

To get an overview of the data we've loaded, we print the different configurations.

In [4]:
print('Classes:')
for n in df_all['Class'].unique().tolist():
    print(f'\t{n}')

print('CPU counts:')
for n in df_all['SystemCpuCount'].unique().tolist():
    print(f'\t{n}')
    
print('Images:', len(df_all['Name'].unique()))
Classes:
	ParallelQpboCapInt32ArcIdxUInt32NodeIdxUInt32
	QpboCapInt32ArcIdxUInt32NodeIdxUInt32
	QPBOInt
CPU counts:
	32
Images: 670

Change short name

For the purpose of plotting, we update the ShotName column values.

In [5]:
df = df_all.copy()
df.loc[df['Class'] == 'QPBOInt', 'ShortName'] = 'K-QPBO'
df.loc[df['Class'].str.startswith('QpboCap'), 'ShortName'] = 'M-QPBO'
df.loc[df['Class'].str.startswith('ParallelQpboCap'), 'ShortName'] = 'P-QPBO'
df.loc[df['CpuCount'] != -1, 'ShortName'] += ' (' + df['CpuCount'].astype(np.str) + ')'
df['SystemCpuCount'] = df['SystemCpuCount'].astype(np.int16)
df['TotalTime'] = df['BuildTime'] + df['SolveTime'] + df['WeakPersistenciesTime']

Filter out results for other systems and configurations

We only want to work with results from a specific system and configuration, so we filter out other. This should have no effect for the data included in the supplementary material.

In [6]:
mask = df['SystemCpu'].str.contains('Gold 6226R')
mask &= (~df['Class'].str.contains('CapInt') | df['Class'].str.contains('CapInt32'))

df = df[mask]

Graph sizes

As the number of nuclei varies a lot between the images, so does the size of the graphs.

In [7]:
df_graph_size = df.groupby('Name')[['NodeCount', 'EdgeCount']].first().reset_index()
In [8]:
df_graph_size.describe()
Out[8]:
NodeCount EdgeCount
count 6.700000e+02 6.700000e+02
mean 7.123406e+05 3.118199e+06
std 7.769930e+05 4.490210e+06
min 1.620000e+04 4.770000e+04
25% 2.470500e+05 8.549065e+05
50% 4.374000e+05 1.589060e+06
75% 8.748000e+05 3.905568e+06
max 6.075000e+06 6.028413e+07

Node distribution

In [9]:
fig, ax = plt.subplots(1, 1, figsize=(5, 2))
dist = df_graph_size['NodeCount']
dist.hist(ax=ax, bins=50)
ax.axvline(dist.min(), c=plt.cm.Set1(4), ls=':', label=f'Min = {int(dist.min()):,}')
ax.axvline(dist.median(), c=plt.cm.Set1(0), ls='--', label=f'Med = {int(dist.median()):,}')
ax.axvline(dist.max(), c=plt.cm.Set1(2), ls=':', label=f'Max = {int(dist.max()):,}')
ax.legend(loc='upper center')
ax.set_xlabel('Nodes')
ax.grid(False)
plt.tight_layout()
if save_figures:
    plt.savefig(os.path.join(figure_dir, f'nodes_dist_2d.pdf'))
plt.show()

Edge distribution

In [10]:
fig, ax = plt.subplots(1, 1, figsize=(5, 2))
dist = df_graph_size['EdgeCount']
dist.hist(ax=ax, bins=50)
ax.axvline(dist.min(), c=plt.cm.Set1(4), ls=':', label=f'Min = {int(dist.min()):,}')
ax.axvline(dist.median(), c=plt.cm.Set1(0), ls='--', label=f'Med = {int(dist.median()):,}')
ax.axvline(dist.max(), c=plt.cm.Set1(2), ls=':', label=f'Max = {int(dist.max()):,}')
ax.legend(loc='upper center')
ax.set_xlabel('Edges')
ax.grid(False)
plt.tight_layout()
if save_figures:
    plt.savefig(os.path.join(figure_dir, f'edges_dist_2d.pdf'))
plt.show()

Group and show solve times

We can now group the data to get an overview of the solve times for N1 and N2 for each algorithm and thread configuration. The results are used in the paper, where we report the minimum (best) solve time for each group.

In [11]:
df_an = df[df['CpuCount'] <= 16].reset_index(drop=True)
In [12]:
# This is perhaps not how we should calculate total time.
df_an['TotalTime'] = df_an['BuildTime'] + df_an['SolveTime'] + df_an['WeakPersistenciesTime']
In [13]:
df_group = df_an.groupby(['Class', 'SystemCpu', 'CpuCount'])
df_group[['SolveTime']].describe()
Out[13]:
SolveTime
count mean std min 25% 50% 75% max
Class SystemCpu CpuCount
ParallelQpboCapInt32ArcIdxUInt32NodeIdxUInt32 Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz 1 6700.0 0.742131 1.759518 0.006489 0.113988 0.267364 0.868934 33.139519
2 6700.0 0.466397 1.070148 0.006584 0.078040 0.162288 0.531583 19.165917
4 6700.0 0.348649 0.798531 0.006701 0.060217 0.123298 0.387379 15.575568
6 6700.0 0.347912 0.734691 0.006807 0.064912 0.131527 0.389747 13.653808
8 6700.0 0.366607 0.732269 0.006953 0.070558 0.144284 0.424805 13.448907
16 3350.0 0.446636 0.826603 0.008229 0.097924 0.182390 0.539946 14.263395
QPBOInt Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz -1 6700.0 0.992313 1.864919 0.001829 0.151800 0.395636 1.215376 28.132384
QpboCapInt32ArcIdxUInt32NodeIdxUInt32 Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz -1 6700.0 0.539458 1.013378 0.002365 0.093574 0.228175 0.639945 15.639238

Group minimum times for each config.

In [14]:
df_group = df_an.groupby(['Name', 'Class', 'SystemCpu', 'CpuCount'])
df_min = df_group.min().reset_index()

Separate configs.

In [15]:
df_qpbo = df_min[df_min['Class'] == 'QPBOInt'].sort_values('Name').reset_index()
df_mqpbo = df_min[df_min['Class'] == 'QpboCapInt32ArcIdxUInt32NodeIdxUInt32'].sort_values('Name').reset_index()
dfs_pqpbo = {}
for k, g in df_min[df_min['Class'] == 'ParallelQpboCapInt32ArcIdxUInt32NodeIdxUInt32'].groupby('CpuCount'):
    dfs_pqpbo[k] = g.reset_index().sort_values('Name')
In [16]:
assert (np.array(df_qpbo['Name']) == np.array(df_mqpbo['Name'])).all()
for k in dfs_pqpbo:
    assert (np.array(df_qpbo['Name']) == np.array(dfs_pqpbo[k]['Name'])).all()

Compute speed-up

To investigate the performance difference between the three QPBO implementations, we compute the relative speed-up on each image/task for M-QPBO and P-QPBO compared to K-QPBO.

In [17]:
df_qpbo['RelativeSolveTime'] = 1
df_mqpbo['RelativeSolveTime'] = df_mqpbo['SolveTime'] / df_qpbo['SolveTime']
for k in dfs_pqpbo:
    dfs_pqpbo[k]['RelativeSolveTime'] = dfs_pqpbo[k]['SolveTime'] / df_qpbo['SolveTime']
    
df_qpbo['DiffSolveTime'] = 0
df_mqpbo['DiffSolveTime'] = df_mqpbo['SolveTime'] - df_qpbo['SolveTime']
for k in dfs_pqpbo:
    dfs_pqpbo[k]['DiffSolveTime'] = dfs_pqpbo[k]['SolveTime'] - df_qpbo['SolveTime']
In [18]:
short_names = df_min['ShortName'].unique().tolist()
df_rel = df_mqpbo['RelativeSolveTime'].reset_index().copy().rename(columns={'RelativeSolveTime': short_names[-1]})
for i, k in enumerate(dfs_pqpbo):
    dfp = dfs_pqpbo[k]
    df_rel[short_names[i]] = dfs_pqpbo[k]['RelativeSolveTime']
df_rel.drop(columns='index', inplace=True)

Speed-up histogram for all images

We can plot a histogram of the speed-ups. However, it is a bit difficult to interpret due to the number of configurations.

In [19]:
ax = (1 / df_rel).plot.hist(bins=50, figsize=(15, 7), histtype='step')
ax.set_xlabel('Relative speed-up (times)')
ax.set_title('Relative speed-up compared to K-QPBO')
plt.show()

Plotting only three configurations makes it easier to read.

In [20]:
ax = (1 / df_rel.iloc[:, [0, 2, 3]]).plot.hist(bins=50, figsize=(15, 7), alpha=0.3)
ax.set_xlabel('Relative speed-up (times)')
ax.set_title('Relative speed-up compared to K-QPBO')
plt.show()

Speed-up boxplot for all images

Boxplots are a nice way to display the vital information about the distributions for the different configurations. This figure is included in the paper.

In [21]:
ax = (1 / df_rel).plot.box(figsize=(5, 3))
ax.set_ylabel('Solve time speed-up (times)')
ax.axhline(1, label='K-QPBO', c='r', ls='--')
ax.grid(axis='y')
ax.legend()
ax.set_ylim(0, 6)
plt.xticks(rotation=25)
plt.tight_layout()
if save_figures:
    plt.savefig(os.path.join(figure_dir, f'qpbp_boxplot_nuclei_cap32.pdf'))
plt.show()

The plot shows that the M-QPBO and P-QPBO provides a significant speed-up over K-QPBO for most of the images. However, the very small tasks (images with only a few nuclei) are not of much interest as the segmentation is found very fast by all three implementations.

Information about distributions.

In [22]:
(1 / df_rel).describe()
Out[22]:
M-QPBO P-QPBO (1) P-QPBO (2) P-QPBO (4) P-QPBO (6) P-QPBO (8) P-QPBO (16)
count 670.000000 670.000000 670.000000 670.000000 670.000000 670.000000 670.000000
mean 1.698791 1.353813 2.128279 2.915371 2.866759 2.687505 2.041342
std 0.268916 0.249575 0.490023 0.869601 0.933576 0.909977 0.628836
min 0.709141 0.259230 0.260700 0.254822 0.255327 0.245099 0.197078
25% 1.520549 1.222950 1.861748 2.393597 2.323949 2.192586 1.663632
50% 1.765695 1.405977 2.222361 2.919290 2.861779 2.643926 2.016557
75% 1.912108 1.515362 2.479360 3.525337 3.433107 3.155668 2.313386
max 2.519490 1.912989 3.127080 5.296966 5.570979 5.557092 4.340782

Boxplot for images with 16 or more nuclei

We can do the same boxplots, but including only results for images with 16 or more nuclei. This figure is included in the paper.

In [23]:
nuclei_count = 16
mask_slow = (df_qpbo['NucleiCount'] >= nuclei_count)
In [24]:
df_rel = df_mqpbo['RelativeSolveTime'].reset_index().copy().rename(columns={'RelativeSolveTime': short_names[-1]})
for i, k in enumerate(dfs_pqpbo):
    dfp = dfs_pqpbo[k]
    df_rel[short_names[i]] = dfs_pqpbo[k]['RelativeSolveTime']
df_rel.drop(columns='index', inplace=True)
df_rel = df_rel[mask_slow]
print('Images:', len(df_rel))
Images: 502
In [25]:
ax = (1 / df_rel).plot.box(figsize=(5, 3))
ax.set_ylabel('Solve time speed-up (times)')
ax.axhline(1, label='K-QPBO', c='r', ls='--')
ax.grid(axis='y')
ax.legend()
ax.set_ylim(0, 6)
plt.xticks(rotation=25)
plt.tight_layout()
if save_figures:
    plt.savefig(os.path.join(figure_dir, f'qpbp_boxplot_nuclei_n{nuclei_count}_i{mask_slow.sum()}_cap32.pdf'))
ax.set_title(f'{mask_slow.sum()} images with at least {nuclei_count} nuclei (32-bit capacities)')
plt.show()

The plot shows that the M-QPBO and P-QPBO provides a significant speed-up over K-QPBO for all images with 16 or more nuclei, except one image when using P-QPBO(1).

Information about distributions.

In [26]:
(1 / df_rel).describe()
Out[26]:
M-QPBO P-QPBO (1) P-QPBO (2) P-QPBO (4) P-QPBO (6) P-QPBO (8) P-QPBO (16)
count 502.000000 502.000000 502.000000 502.000000 502.000000 502.000000 502.000000
mean 1.794348 1.445294 2.331425 3.252707 3.236718 3.039319 2.271621
std 0.189610 0.160563 0.307224 0.661667 0.709131 0.714984 0.520601
min 1.262034 0.840897 1.518157 1.925828 1.909225 1.667458 1.221357
25% 1.653646 1.353139 2.122769 2.751178 2.732074 2.523913 1.934765
50% 1.826892 1.447605 2.340645 3.190736 3.086310 2.870403 2.151989
75% 1.942272 1.551779 2.543543 3.685401 3.614466 3.390410 2.453047
max 2.519490 1.912989 3.127080 5.296966 5.570979 5.557092 4.340782

Reduction in solve time

The actual reduction in the solve time for each image shows us the practical benefit (time saved) of P-QPBO, depending on the size of the tasks.

In [27]:
df_qpbo['DiffSolveTime'] = 0
df_mqpbo['DiffSolveTime'] = df_mqpbo['SolveTime'] - df_qpbo['SolveTime']
for k in dfs_pqpbo:
    dfs_pqpbo[k]['DiffSolveTime'] = dfs_pqpbo[k]['SolveTime'] - df_qpbo['SolveTime']
In [28]:
short_names = df_min['ShortName'].unique().tolist()
df_diff = df_mqpbo['DiffSolveTime'].reset_index().copy().rename(columns={'DiffSolveTime': short_names[-1]})
for i, k in enumerate(dfs_pqpbo):
    dfp = dfs_pqpbo[k]
    df_diff[short_names[i]] = dfs_pqpbo[k]['DiffSolveTime']
df_diff.drop(columns='index', inplace=True)
In [29]:
ax = df_diff.plot.hist(bins=50, figsize=(15, 7), histtype='step')
ax.set_xlabel('Difference (s)')
ax.set_title('Difference compared to K-QPBO')
plt.show()
In [30]:
ax = df_diff.plot.box(figsize=(15, 7))
ax.set_ylabel('Difference (s)')
ax.axhline(0, label='K-QPBO', c='r', ls='--')
ax.grid(axis='y')
outlier_idx = (dfs_pqpbo[1]['SolveTime'] - df_qpbo['SolveTime']).argmax()
ax.scatter(range(1, len(df_diff.columns) + 1), df_diff.iloc[outlier_idx], c='r', label=f'Image {outlier_idx}')
ax.legend()
plt.show()

Except for one outlier, we see the P-QPBO and M-QPBO can provide a large reduction in solve time in the best cases, while being functionally equivalent to K-QPBO in the worst cases. By functionally equivalent, we mean that the real-time difference is so small it's irrelevant in almost all practical use-cases.

The image 421 is a bit special. It is the image with nuclei and most overlap between the SLG objects. It's size makes is suited for our fast M-QPBO and P-QPBO implementations, however the density of the nuclei negatively implacts the bottom-up mergin, which is particularly noticeable for P-QPBO(1).

In [ ]: