Benchmark: Interactions between Convolution and Pooling layers in Graph Neural Networks

Why Considering Graphs?

Return to Homepage

Context_GNN

Source: Lin et al., 2021

\(\hookrightarrow\) GNN can be seen as an extension of CNN to any topology.

The Notion of Homophily

Return to Homepage

Illustration of the homophily

Source: [Zheng2022GraphNN]

\(\hookrightarrow\) Homophily characterizes the extent to which a node’s features resemble those of its neighbors.

Generic Architecture

Return to Homepage

Generic Framework of a GNN

A GNN boils down to:

Convolution (local dependencies)
Local Pooling (reduce dimension)
Global Pooling (=Readout): transform graph to vector
MLP classification

Graph Convolutional Network

Return to Homepage

Convolutional filters, similar to CNNs, passed on the graph nodes to update them with respect to their neighbors’s features. Definition:

\[h_i' = \sigma \left( \sum_{j \in \mathcal{N}_i} \frac{1}{\sqrt{|\mathcal{N}_i| |\mathcal{N}_j|}} W h_j \right)\]

\(W\): parameters of the layer
\(h_i\): features at node \(i\)
\(\mathcal{N}_i\): neighborhood of node \(i\)

Graph Attention Networks

Return to Homepage

Aggregating features from its neighbors, weighted by attention coefficients. Definition:

\[h'_i = \sigma\left(\sum_{j \in \mathcal{N}_i} \alpha_{ij} \mathbf{W}h_j\right)\] \(\alpha_{ij}\) : attention coefficient indicating the importance of node \(j\)’s features to node \(i\) \[\alpha_{ij} = \frac{\exp\left(\text{LeakyReLU}\left(\mathbf{a}^T[\mathbf{W}h_i \| \mathbf{W}h_j]\right)\right)}{\sum_{k \in \mathcal{N}_i} \exp\left(\text{LeakyReLU}\left(\mathbf{a}^T[\mathbf{W}h_i \| \mathbf{W}h_k]\right)\right)}\]

Graph Isomorphism Network Convolution

Return to Homepage

Maximize the representational/discriminative power of a GNN. Definition:

\[ \mathbf{h}^{\prime}_i = \text{MLP} \left( (1 + \epsilon) \cdot \mathbf{h}_i + \sum_{j \in \mathcal{N}_i} \mathbf{h}_j \right) \] \(\epsilon\) : learnable parameter or a fixed scalar

Top-K pooling operator

Return to Homepage

Top-K Pooling retains only the top-K nodes with the highest scores.

Top-K pooling operator

Self-Attention Graph Pooling

Return to Homepage

Top-K Pooling with attention scores.

Self-Attention Graph Pooling

MEWIS Pool (Maximum Entropy Weighted Independent Set Pooling)

Return to Homepage

Maximizing the Shannon Entropy

MEWIS Pool

EDGE pooling

Return to Homepage

Pairing nodes based on scores.

EDGE pooling

Our datasets

Return to Homepage

	MUTAG	PROTEINS	ENZYMES	NCI1
Number of graphs	188	1113	600	4110
Number of classes	2	2	6	2
Number of features	7	3	3	37
Homophily	0.721	0.657	0.667	0.631

Some graphs from MUTAG dataset (Source: bui2022ingrex)

Mean vs Max Readout

Return to Homepage

Test the differences with Wilcoxon tests.

	p-value	Mean difference	Best architecture
MUTAG	0.258	-0.008	GINConv_EDGE_max
PROTEINS	0.33	0.009	GCN_EDGE_max
ENZYMES	0.207	-0.01	GINConv_EDGE_mean

Result

Return to Homepage

Since p-value > 0.05, the results are equivalent between mean and max readout. \(\hookrightarrow\) We will only keep the global max pooling

Test accuracy vs Number of parameters on MUTAG

Return to Homepage

Test accuracy vs Train time on MUTAG

Test accuracy vs Train
time on MUTAG

Return to Homepage

Test accuracy vs Train time on MUTAG

Test accuracy vs Homophily

Return to Homepage

Test accuracy vs Homophily

Results by pooling

Return to Homepage

		ENZYMES	MUTAG	NCI1	PROTEINS	Train time
EDGE	GCN	0.294 ± 0.026	0.703 ± 0.081	0.717 ± 0.015	0.753 ± 0.024	1327
EDGE	GIN	0.353 ± 0.039	0.847 ± 0.063	0.735 ± 0.010	0.731 ± 0.017	1156
MEWIS	GIN	0.309 ± 0.055	0.789 ± 0.077	0.744 ± 0.006	0.743 ± 0.016	4365
None	GCN	0.316 ± 0.044	0.703 ± 0.065	0.651 ± 0.015	0.743 ± 0.029	40
None	GIN	0.327 ± 0.042	0.803 ± 0.068	0.734 ± 0.018	0.733 ± 0.028	59
SAG	GAT	0.189 ± 0.025	0.676 ± 0.062	0.617 ± 0.024	0.722 ± 0.050	112
	GCN	0.195 ± 0.033	0.682 ± 0.073	0.630 ± 0.021	0.689 ± 0.041	53
	GIN	0.188 ± 0.040	0.761 ± 0.081	0.639 ± 0.036	0.714 ± 0.039	59
TOPK	GAT	0.208 ± 0.054	0.689 ± 0.093	0.623 ± 0.045	0.682 ± 0.033	110
	GCN	0.176 ± 0.035	0.739 ± 0.075	0.631 ± 0.034	0.694 ± 0.032	55
	GIN	0.205 ± 0.056	0.761 ± 0.079	0.617 ± 0.033	0.697 ± 0.027	56

Best architecture per pooling:

Dataset	ENZYMES	MUTAG	NCI1	PROTEINS
EDGE	GIN	GIN	GIN	GCN
MEWIS	GIN	GIN	GIN	GIN
None	GIN	GIN	GIN	GCN
SAG	GCN	GIN	GIN	GAT
TOPK	GAT	GIN	GCN	GIN

Results by architecture

Return to Homepage

		ENZYMES	MUTAG	NCI1	PROTEINS	Total Time
GAT	MEWIS	\(0.295 \pm0.040\)	\(0.742 \pm0.086\)	\(0.693 \pm0.008\)	\(0.722 \pm0.022\)	3225
	None	\(0.310 \pm0.053\)	\(0.679 \pm0.087\)	\(0.659 \pm0.023\)	\(0.734 \pm0.027\)	90
GCN	EDGE	\(0.294 \pm0.026\)	\(0.703 \pm0.081\)	\(0.717 \pm0.015\)	\(0.753 \pm0.024\)	1327
	None	\(0.316 \pm0.044\)	\(0.703 \pm0.065\)	\(0.651 \pm0.015\)	\(0.743 \pm0.029\)	40
	TOPK	\(0.176 \pm0.035\)	\(0.739 \pm0.075\)	\(0.631 \pm0.034\)	\(0.694 \pm0.032\)	55
GIN	EDGE	\(0.353 \pm0.039\)	\(0.847 \pm0.063\)	\(0.735 \pm0.010\)	\(0.731 \pm0.017\)	1156
	MEWIS	\(0.309 \pm0.055\)	\(0.789 \pm0.077\)	\(0.744 \pm0.006\)	\(0.743 \pm0.016\)	4365

Best pooling per architecture :

Dataset	ENZYMES	MUTAG	NCI1	PROTEINS
GAT	None	MEWIS	MEWIS	None
GCN	None	TOPK	EDGE	EDGE
GIN	EDGE	EDGE	MEWIS	MEWIS

Conclusion

Return to Homepage

Key idea

GNN/CNN : convolution / pooling
Best pairing : GINConv - Edge / Mewis pool
Attention : dataset / architecture

Work to be done

Bigger datasets
Tuning the architecture
Other methods