Robust Graph Neural Network Against Poisoning Aacks viaTransfer Learning
Xianfeng Tang
†
, Yandong Li
§
, Yiwei Sun
†
, Huaxiu Yao
†
, Prasenjit Mitra
†
, Suhang Wang
†
∗
Pennsylvania State University
†
University of Central Florida
§
{
tangxianfeng, lyndon.leeseu
}
@outlook.com
{
yus162, huy144, pum10 ,szw494
}
@psu.edu
ABSTRACT
Graph neural networks (GNNs) are widely used in many applications. However, their robustness against adversarial aacks arecriticized. Prior studies shows that using unnoticeable modiﬁca
tions on graph topology or nodal features can signiﬁcantly reducethe performances of GNNs. It is very challenging to design robustgraph neural networks against poisoning aack and several eﬀorts
have been taken. Existing work aims at reducing the negative im
pact from adversarial edges only with the poisoned graph, which is
suboptimal since they fail to discriminate adversarial edges from
normal ones. On the other hand, clean graphs from similar domainsas the target poisoned graph are usually available in real world. By
perturbing these clean graphs, we create supervised knowledge to
train the ability to detect adversarial edges so that the robustnessof GNNs is elevated. However, such potential for clean graphs is
neglectedbyexistingwork. Tothisend, weinvestigateanovelproblem of improving the robustness of GNNs against poisoning aacksby exploring clean graphs. Speciﬁcally, we propose PAGNN, which
relies on a penalized aggregation mechanism that directly restrict
the negative impact of adversarial edges by assigning them loweraention coeﬃcients. To optimize PAGNN for a poisoned graph,we design a metaoptimization algorithm that trains PAGNN topenalize perturbations using clean graphs and their adversarial
counterparts, and transfers such ability to improve the robustness
of PAGNN on the poisoned graph. Experimental results on four
realworld datasets demonstrate the robustness of PAGNN against
poisoning aacks on graphs.
ACM Reference format:
Xianfeng Tang
†
, Yandong Li
§
, Yiwei Sun
†
, Huaxiu Yao
†
, Prasenjit Mitra
†
,Suhang Wang
†
. 2016. Robust Graph Neural Network Against Poisoning Attacks via Transfer Learning. In
Proceedings of ACM Conference, Washington,
DC, USA, July 2017 (Conference’17),
9 pages.DOI: 10.1145/nnnnnnn.nnnnnnn
1 INTRODUCTION
Graph neural networks (GNNs) [
8
,
13
,
18
], which explore the powerofneuralnetworksforgraphdata, haveachievedremarkableresults
∗
Corresponding Author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributedfor proﬁt or commercial advantage and that copies bear this notice and the full citationon the ﬁrst page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permied. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a
fee. Request permissions from permissions@acm.org.
Conference’17, Washington, DC, USA
© 2016 ACM. 978xxxxxxxxxx/YY/MM...$15.00DOI: 10.1145/nnnnnnn.nnnnnnn
in various applications such as social recommendation [
9
] andnatural language processing [
16
,
40
]. e key to the success of GNNs is its signalpassing process [
34
], where information from
neighbors is aggregated for every node in each layer. e collectedinformation enriches node representations, preserving both nodal
feature characteristics and topological structure.
oughGNNsareeﬀectiveformodelinggraphdata, thewaythat
GNNs aggregate neighbor nodes’ information for representation
learning makes them vulnerable to adversarial aacks [
7
,
35
,
37
,
41
,
42
]. Poisoning aack on a graph [
41
], which adds/deletes carefully
chosen edges to the graph topology or injects carefully designed
perturbationstonodalfeatures,cancontaminatetheneighborhoods
of nodes, bring noises/errors to node representations, and degrade
the performances of GNNs signiﬁcantly. e lack of robustness
become a critical issue of GNNs in many applications such as ﬁnan
cial system and risk management [
1
]. For example, fake accountscreated by a hacker can add friends with normal users on socialnetworks to promote their scores predicted by a GNN model. A
model that’s not robust enough to resist such “cheap” aacks could
lead to serious consequences. Hence, it is important to developrobust GNNs against adversarial aacks. Recent studies of adver
sarial aacks on GNNs suggest that adding perturbed edges is moreeﬀective than deleting edges or adding noises to node features [
35
].
is is because node features are usually highdimensional, requir
ing larger budgets to aack. Deleting edges only result in the loss
of some information while adding edges is cheap to contaminate
information passing dramatically. For example, adding a few bridge
edges connecting two communities can aﬀect the latent represen
tations of many nodes. us, we focus on
defense against the more
eﬀective poisoning aacks that a training graph is poisoned with
injected adversarial edges
.
To defend against the injected adversarial edges, a natural idea
is to delete these adversarial edges or reduce their negative impacts.
Several eﬀorts have been made in this direction [
17
,
35
,
39
]. For
example, Wuetal
.
[
35
]utilizeJaccardsimilarityoffeaturestoprune
perturbed graphs with the assumption that connected nodes have
highfeaturesimilarity. RGCNin[
39
]introduceGaussianconstrains
on model parameters to absorb the eﬀects of adversarial changes.e aforementioned models only rely on the poisoned graph fortraining, leading to suboptimal solutions. e lack of supervised
information about real perturbations in a poisoned graph obstructs
models from modeling the distribution of adversarial edges. ere
fore, exploring alternative supervision for learning the ability to
reduce the negative eﬀects of adversarial edges is promising.
ere usually exist clean graphs with similar topological distri
butions and aribute features to the poisoned graph. For example,
Yelp and Foursquare have similar coreview networks where the
a r X i v : 1 9 0 8 . 0 7 5 5 8 v 1 [ c s . L G ] 2 0 A u g 2 0 1 9
Conference’17, July 2017, Washington, DC, USA X. Tang et al.
nodes are restaurants and two restaurants are linked if the number of coreviewers exceeds a threshold. Facebook and Twiercan be treated as social networks that share similar domains. It isnot diﬃcult to acquire similar graphs for the targeted perturbedone. As shown in existing work [
19
,
29
], because of the similarityof topological and aribute features, we can transfer knowledgefrom source graphs to target ones so that the performance on target graphs is elevated. Similarly, we can inject adversarial edgesto clean graphs as supervisions for training robust GNNs, whichare able to penalize adversarial edges. Such ability can be furthertransferred to improve the robustness of GNNs on the poisoned
graph. Leveraging clean graphs to build robust GNNs is a promis
ing direction. However, prior studies in this direction are rather
limited.
erefore, in this paper, we investigate a novel problem of ex
ploring clean graphs for improving the robustness of GNNs againstpoisoning aacks. e basic idea is ﬁrst learning to discriminate ad
versarial edges, thereby reducing their negative eﬀects, then trans
ferring such ability to a GNN on the poisoned graph. In essence,
we are faced with two challenges: (
i
) how to mathematically utilize
clean graphs to equip GNNs with the ability of reducing negativeimpacts of adversarial edges; and (
ii
) how to eﬀectively transfersuch ability learned on clean graphs to improve the robustnessof GNNs on a poisoned graph. In an aempt to solve these chal
lenges, we propose a novel framework
P
enalized
A
ggregation
GNN
(PAGNN). Firstly, clean graphs are aacked by adding adversarial
edges, which serve as supervisions of known perturbations. With
these known adversarial edges, a penalized aggregation mechanism
is then designed to learn the ability of alleviating negative inﬂuence from perturbations. We further transfer this negative eﬀect
alleviation ability to the target poisoned graph with a special meta
optimization approach, so that the robustness of GNNs is elevated.
To the best of our knowledge, we are the ﬁrst one to propose a GNN
that can directly penalize perturbations and to leverage transferlearning for enhancing the robustness of GNN models. e main
contributions of this paper are:
•
We study a new problem and propose a principle approach of exploringcleangraphsforlearningarobustGNNagainstpoisoning
aacks on a target graph;
•
WeprovideanovelframeworkPAGNN,whichisabletoalleviate
the negative eﬀects of adversarial edges with carefully designed
penalized aggregation mechanism, and transfer the alleviation
ability to the target poisoned graph with metaoptimization;
•
We conduct extensive experiments on realworld datasets todemonstrate the eﬀectiveness of PAGNN against various poi
soning aacks and to understand its behaviors.
e rest of the paper is organized as follows. We review related
work in Section 2. We deﬁne our problems in Section 3. We intro
ducethe details of PAGNNin Section4. Extensive experiments and
their results are illustrated and analyzed in Section 5. We conclude
the paper in Section 6.
2 RELATED WORK
In this section, we brieﬂy review related works, including graph
neural networks, adversarial aack and defense on graphs.
2.1 Graph Neural Networks
Ingeneral,graphneuralnetworksrefertoalldeeplearningmethods
for graph data [
36
]. It can be generally categorized into two cate
gories, i.e., spectralbased and spatialbased. Spectralbased GNNsdeﬁne “convolution” following spectral graph theory [
3
]. e ﬁrst
generation of GCNs are developed by Bruna et al
.
[
3
] using spectral
graph theory. Various spectralbased GCNs are developed later on
[
8
,
14
,
18
,
20
]. To improve eﬃciency, spatialbased GNNs are pro
posed to overcome this issue [
11
,
13
,
23
,
25
]. Because spatialbased
GNNs directly aggregate neighbor nodes as the convolution, andare trained on minibatches, they are more scalable than spectral
based ones. Recently, Veli
ˇ
ckovi
´
c et al
.
[
32
] propose graph aention
network (GAT) that leverages selfaention of neighbor nodes for
the aggregation process. e major idea of GATs [
38
] is focusingon most important neighbors and assign higher weights to them
during the information passing. However,
existing GNNs aggregates neighbors’ information for representation learning, making them vul
nerable to adversarial aacks, especially perturbed edges added to
the graph topology.
Next, we review adversarial aack and defense
methods on graphs.
2.2 Adversarial Attack and Defense on Graphs
Neural networks are widely criticized due to the lack of robustness
[
5
,
6
,
12
,
21
], and the same to GNNs. Various adversarial aackmethods have been designed, showing the vulnerability of GNNs
[
2
,
4
,
7
]. ere are two major categories of adversarial aack meth
ods, namely evasion aack and poisoning aack. Evasion aackfocuses on generating fake samples for a trained model. Dai et al
.[7] introduce an evasion aack algorithm based on reinforcement
learning. On the contrary, poisoning aack changes training data,
which can decrease the performance of GNNs signiﬁcantly. Forexample, Z
¨
ugner et al
.
[
41
] propose
neack
which make GNNsfail on any selected node by modifying its neighbor connections.
ey further develop
metaack
[
42
] that reduces the overall perfor
mance of GNNs. Comparing with evasion aack, poisoning aack
methods are usually stronger and can lead to an extremely lowperformance [
39
,
41
], because of its destruction of training data.
Besides, it is almost impossible to clean up a graph which is already
poisoned. erefore, we focus on defending the poisoning aack
of graph data in this paper.
How to improve the robustness of GNNs against adversarial
poising aacks is aracting increasing aention and initial eﬀortshave been taken [
17
,
35
,
37
,
39
]. For example, Wu et al
.
[
35
] utilize
the Jaccard similarity of features to prune perturbed graphs withthe assumption that connected nodes should have high feature
similarity. RGCN in [
39
] adopts Gaussian distributions as the noderepresentations in each convolutional layer to absorb the eﬀects of adversarial changes in the variances of the Gaussian distributions.
e basic idea of aforementioned robust GNNs against poisoningaack is to alleviate the negative eﬀects of the perturbed edges.However, perturbed edges are treated equally as normal edges
during aggregation in existing robust GNNs.
e proposed PAGNN is inherently diﬀerent from existingworks: (
i
) instead of purely trained on the poisoned target graph,
adopting clean graphs with similar domains to learn the ability of
Robust Graph Neural Network Against Poisoning Aacks via Transfer Learning Conference’17, July 2017, Washington, DC, USA
alleviating negative eﬀects of adversarial edges; and (
ii
) investigat
ing metalearning to transfer such ability to the target poisoned
graph for improving the robustness.
3 PRELIMINARIES3.1 Notations
We use
G
=
(V
,
E
,
X
)
to denote a graph, where
V
=
{
1
, . . . ,
N
}
is the set of
N
nodes,
E ⊆ V ×V
represents the set of edges, and
X
=
{
x
1
, . . . ,
x
N
}
indicates node features. In a semisupervised
seing,partialnodescomewithlabelsandaredeﬁnedas
V
L
,where
the corresponding label for node
is denoted by
. Note that thetopology structure of
G
is damaged, and the srcinal clean version
is unknown. In addition to the poisoned graph
G
, we assume thereexists
M
clean graphs sharing similar domains with
G
. For example,when
G
is the citation network of publications in data mining ﬁeld,
a similar graph can be another citation network from physics. Weuse
{
G
1
, . . . ,
G
M
}
to represent clean graphs. Similarly, each clean
graph consists of nodes and edges. We use
V
Li
to denote the labeled
nodes in graph
G
i
.
3.2 Basic GNN Design
We introduce the general architecture of a graph neural network.
A graph neural network contains multiple layers. Each layer transforms its input node features to another Euclidean space as output.
Diﬀerent from fullyconnected layers, a GNN layer takes ﬁrstorder
neighbors’ information into consideration when transforming the
feature vector of a node. is “messagepassing” mechanism ensures the initial features of any two nodes can aﬀect each other
even if they are faraway neighbors, along with the network going
deeper. e input node features to the
l
th layer in an
L
layer GNNcan be represented by a set of vectors
H
l
=
{
h
l
1
, . . . ,
h
l N
}
,
h
l i
∈
R
d
l
,
where
h
l i
corresponds to
i
. Obviously,
H
1
=
X
. e output node
features of the
l
th layer, which also formulate the input to the next
layer, are generated as follows:
h
l
+
1
i
=
Update
h
l i
,
Agg
(
h
l j

j
∈ N
i
)
(1)
where
N
i
is the set of ﬁrstorder neighbors of node
i
,
Agg
(·)
indicates a generic aggregation function on neighbor nodes, and
Update
(·)
is an update function that generates a new node representation vector from the previous one and messages from neighbors. Most graph neural networks follow the above deﬁnition.For example, Hamilton et al
.
[
13
] introduce mean, pooling andLSTM as the aggregation function, Veli
ˇ
ckovi
´
c et al
.
[
32
] leverage
selfaention mechanism to update node representations. A GNN
can be represented by a parameterized function
f
θ
where
θ
repre
sents parameters, the loss function can be represented as
L
c
(
θ
)
. In
semisupervised learning, the crossentropy loss function for node
classiﬁcation takes the form:
L
c
(
θ
)
=
−
∈V
L
log ˆ
,
(2)
where
ˆ
is the predicted label generated by passing the output
from the ﬁnal GNN layer to a somax function.
Figure 1: Overall framework of PAGNN. icker arrows indicate higher attention coeﬃcients.
θ
∗
denotes the modelinitialization from metaoptimization.
3.3 Problem Deﬁnition
With the aforementioned notations and deﬁnitions, the problem of
exploringcleangraphsforlearningarobustGNNagainstpoisoning
aacks on a target graph is formally deﬁned as:
Given the target graph
G
that is poisoned with adversarial edges, a
set of clean graphs
{
G
1
, . . . ,
G
M
}
from similar domain as
G
, and the
partially labeled nodes of each graph (i.e.,
{
V
L
1
, . . . ,
V
LM
;
V
L
}
), we
aim at learning a robust GNN to predict the unlabeled nodes of
G
.
Itisworthnotingthat,inthispaper,welearnarobustGNNforsemi
supervised node classiﬁcation. e proposed PAGNN is a generalframework for learning robust GNN of various graph mining tasks
such as link prediction.
4 PROPOSED FRAMEWORK
In this section, we give the details of PAGNN. An illustration of theframeworkisshowninFigure1. Firstly,cleangraphs
{
G
1
, . . . ,
G
M
}
are introduced to generate perturbed edges. e generated perturbations then serve as supervised knowledge to train a modelinitialization for PAGNN using metaoptimization. Finally, weﬁnetune the initialization on the target poisoned graph for thebest performance. anks to the metaoptimization, the ability to
reduce negative eﬀects of adversarial aack is retained aer adapt
ing to
G
. In the following sections, we introduce technical details
of PAGNN.
4.1 Penalized Aggregation Mechanism
We begin by analyzing the reason why GNNs are vulnerable to
adversarial aacks with the general deﬁnition of GNNs in Equation
1. Suppose the graph data fed into a GNN is perturbed, the aggregation function
Agg
(·)
treats “fake” neighbors equally as normalones, and propagates their information to update other nodes. Asa result, GNNs fail to generate desired outputs under inﬂuence of adversarial aacks. Consequently, if messages passing through
perturbed edges are ﬁltered, the aggregation function will focus on
“true” neighbors. In an ideal condition, GNNs can work well if all
perturbed edges produced by aackers are ignored.
Motivated by above analysis, we design a novel GNN with penalized aggregation mechanism (PAGNN) which automaticallyrestrict the messagepassing through perturbed edge. Firstly, we
adoptsimilarimplementationfrom[
31
]anddeﬁnetheselfaention
Conference’17, July 2017, Washington, DC, USA X. Tang et al.
coeﬃcient
a
l ij
for node features of
i
and
j
on the
l
the layer using
a nonlinear function:a
l ij
=
LeakyReLU
(
a
l
)
⊤
[
W
l
h
l i
⊕
W
l
h
l j
]
,
(3)
where
a
l
and
W
l
are parameters,
⊤
represents the transposition,
and
⊕
indicates the concatenation of vectors. Note that coeﬃcientsareonlydeﬁnedforﬁrstorderneighbors. Take
i
asanexample,weonly compute
a
l ij
for
j
∈ N
i
, which is the set of direct neighbors of
i
. e aention coeﬃcients related to
i
are further normalized
among all nodes in
N
i
for comparable scores:
α
l ij
=
exp
a
l ij
k
∈N
i
exp
a
l ik
.
(4)
We use normalized aention coeﬃcient scores to generate a linear combination of their corresponding node features. e linear
combination process serves as the aggregating process, and its re
sults are utilized to update node features. More concretely, a graph
neural network layer is constructed as follows:
h
l
+
1
i
=
σ
j
∈N
j
α
l ij
W
l
h
l j
.
(5)
A similar deﬁnition can be found in [
32
]. Clearly, the above design
of GNN layer cannot discriminate perturbed edges, let alone alleviate their negative eﬀects on the “messagepassing” mechanism,because there is no supervision to teach it how to honor normaledges and punish perturbed ones. A natural solution to this prob
lem is reducing the aention coeﬃcients for all perturbed edges ina poisoned graph. Noticing the exponential rectiﬁer in Equation 4,a lower aention coeﬃcient only allows lile information passingthrough its corresponding edge, which mitigate negative eﬀects if
the edge is an adversarial one. Moreover, since normalized aen
tion coeﬃcient scores of one node always sum up to 1, reducing the
aention coeﬃcient for perturbed edges will also introduce more
aention to clean neighbors. To measure the aention coeﬃcients
received by perturbed edges, we propose the following metric:
S
p
=
L
l
=
1
e
ij
∈P
a
l ij
,
(6)
where
L
is the total number of layers in the network, and
P
denotestheperturbededges. Generally, asmaller
S
p
indicateslessaention
coeﬃcients received by adversarial edges. To further train GNNssuch that a lower
S
p
is guaranteed, we design the following loss
function to penalize perturbed edges:
L
dist
=
−
min
η
,
E
e
ij
∈E\P
1
≤
l
≤
L
a
l ij
−
E
e
ij
∈P
1
≤
l
≤
L
a
l ij
,
(7)
where
η
is a hyper parameter controlling the margin between mean
values of two distributions,
E\P
represents normal edges in thegraph, and
E
computes the expectation. Using the expectation of
aention coeﬃcients for all normal edges as an anchor,
L
dist
aims
at reducing the averaged aention coeﬃcient of perturbed edges,
until a certain discrepancy of
η
between these two mean values is
satisﬁed. Note that minimizing
S
p
directly instead of
L
dist
willlead to unstable aention coeﬃcients, making PAGNN hard to
converge. e expectations of aention coeﬃcients are estimated
by their empirical means:
E
e
ij
∈E\P
1
≤
l
≤
L
a
l ij
=
1
L
E\P
L
l
=
1
e
ij
∈E\P
a
l ij
,
(8)
E
e
ij
∈P
1
≤
l
≤
L
a
l ij
=
1
L
P
L
l
=
1
e
ij
∈P
a
l ij
,
(9)
where
 · 
denotes the cardinality of a set. We combine
L
dist
with
the srcinal crossentropy loss
L
c
and create the following learning
objective for PAGNN:
min
θ
L
=
min
θ
(L
c
+
λ
L
dist
)
,
(10)
where
λ
balances the semisupervised classiﬁcation loss and the
aention coeﬃcient scores on perturbed edges.
TrainingPAGNNwiththeaboveobjectivedirectlyisnontrivial,
because it is unlikely to distinguish exact perturbed edges
P
from
normal edges in a poisoned graph. However, it is practical to discover vulnerable edges from clean graphs with adversarial aack
methods on graphs. For example,
metaack
poisons a clean graph
to reduces the performance of GNNs by adding adversarial edges,which can be treated as the set
P
. erefore, we explore clean
graphs from domains similar to the poisoned graph. Speciﬁcally, asshowninFigure1, weﬁrstinject perturbationedgestocleangraphs
using adversarial aack methods, then leverage those adversarial
counterparts to train the ability to penalize perturbed edges. Such
ability is further transferred to GNNs on the target graph, so thatthe robustness is improved. In the following section, we discuss
how we transfer the ability to penalize perturbed edges from clean
graphs to the target poisoned graph in detail.
4.2 Transfer with MetaOptimization
As discussed above, it is very challenging to train PAGNN for a
poisoned graph because the adversarial edge distribution remains
unknown. We turn to exploit clean graphs from similar domains tocreate adversarial counterparts that serve as supervised knowledge.
One simple solution to utilize them is pretraining PAGNN on
clean graphs with perturbations, which formulate the set of adversarial edges
P
. en the pretrained model is ﬁnetuned on target
graph
G
purely with the node classiﬁcation objective. However,
the performance of pretraining with clean graphs and adversarialedges is rather limited, because graphs have diﬀerent data distribu
tions, making it diﬃcult to equip GNNs with a generalized ability
to discriminate perturbations. Our experimental results in Section
5.3 also conﬁrm the above analysis.
In recent years, metalearning has shown promising results in
various applications [
24
,
27
,
30
,
33
]. e goal of metalearning is to
train a model on a variety of learning tasks, such that it can solve
new tasks with a small amount or even no supervision knowledge
[
10
,
15
]. Finn et al
.
[
10
] propose modelagnostic metalearningalgorithm where the model is trained explicitly such that a small
number of gradient steps and few training data from a new task can
also produce good generalization performance on that task. ismotivates us to train a meta model with a generalized ability topenalize perturbed edges (i.e., assign lower aention coeﬃcients).
e meta model serve as the initialization of PAGNN, and its fast
adaptationcapabilityhelpsretainsuchpenalizingabilityasmuchas
Robust Graph Neural Network Against Poisoning Aacks via Transfer Learning Conference’17, July 2017, Washington, DC, USA
possible on the target poisoned graph. To achieve the goal, we propose a metaoptimization algorithm that trains the initialization of PAGNN. With manually generated perturbations on clean graphs,PAGNN receive full supervision and its initialization preserve the
penalizing ability. Further ﬁnetuned model on the poisoned graph
G
is able to defend adversarial aacks and maintain an excellent
performance.
We begin with generating perturbations on clean graphs. State
oftheart adversarial aack method for graph –
metaack
[
42
]is chosen. Let
P
i
represent the set of adversarial edges createdfor clean graph
G
i
. Next, we deﬁne learning tasks for the meta
optimization. e learning objective of any task is deﬁned in Equa
tion 10, which aims at classifying nodes accurately while assigning
low aention coeﬃcient scores to perturbed edges on its corresponding graph. Let
T
i
denote the speciﬁc task for
G
i
. Namely,there are
M
tasks in accordance with clean graphs. Because cleangraphs are speciﬁed for every task, we use
L
T
i
(
θ
)
to denote theloss function of task
T
i
. We then compile support sets and query
sets for learning tasks. Labeled nodes from each clean graph is splitinto two groups – one for the support set and the other as the query
set. Let
S
i
and
Q
i
denote the support set and the query set for
G
i
,
respectively.
Given
M
learning tasks, the optimization algorithm ﬁrst adaptsthe initial model parameters to every learning task separately. Formally,
θ
becomes
θ
′
i
when adapting to
T
i
. We use gradient descent
to compute the updated model parameter
θ
′
i
. e gradient w.r.t
θ
′
i
is evaluated using
L
T
i
(
θ
)
on corresponding support set
S
i
, and the
initial model parameters
θ
are updated as follows:
θ
′
i
=
θ
−
α
∇
θ
L
T
i
(
θ
)
,
(11)
where
α
controls the learning rate. Note that only one gradient
step is shown in Equation 11, but using multiple gradient updates
is a straightforward extension, as suggested by [
10
]. ere are
M
diﬀerentversionsoftheinitialmodel(i.e.,
f
θ
′
i
,
···
,
f
θ
′
M
)constructed
in accordance with learning tasks.
e model parameters are trained by optimizing for the perfor
mance of
f
θ
′
i
with respect to
θ
across all tasks. More concretely, we
deﬁne the following objective function for the metaoptimization:
min
θ M
i
=
1
L
T
i
(
θ
′
i
)
=
min
θ M
i
=
1
L
T
i
(
θ
−
α
∇
θ
L
T
i
(
θ
))
.
(12)
Because both classifying nodes and penalizing adversarial edges
are considered by the objective of PAGNN, model parameters willpreserve the ability to reduce the negative eﬀects from adversarial
aacks while maintaining a high accuracy for the classiﬁcation.
Note that we perform metaoptimization over
θ
with the objective
computed using the updated model parameters
θ
′
i
for all tasks. Con
sequently, model parameters are optimized such that few numbers
of gradient steps on a new task will produce maximally eﬀective
behavior on that task. e characteristic of fastadaptation on newtasks would help the model retain the ability to penalize perturbed
edges on
G
, which is proved by the experiential results in Section
5.3.1. Formally, stochastic gradient descent (SGD) is used to update
model parameters
θ
cross tasks:
θ
←
θ
−
β
∇
θ M
i
=
1
L
T
i
(
θ
′
i
)
.
(13)
In practice, the above gradients are estimated using labeled nodes
from query sets
S
i
of all tasks. Our empirical results suggest that
spliing support sets and query sets ontheﬂy through iterations
of the metaoptimization improves overall performance. We adopt
this strategy for the training procedure of PAGNN.
TrainingAlgorithm
AnoverviewofthetrainingprocedureofPAGNN is illustrated in Algorithm 1. Adversarial edges are injected to
clean graphs using
metaack
(Line 2 to 4). Support sets and querysets are randomly split ontheﬂy for each task in Line 7. We then
adapt initial model parameter
θ
to
θ
′
i
for each task
T
i
. e modelparameters are updated by optimizing for the performance of all
f
θ
′
i
w.r.t
θ
using labeled nodes from query sets (Line 11). Finally,
we adapt the trained model initialization to the targeted poisoned
graph by minimizing the classiﬁcation loss
L
c
on
G
.
Algorithm 1:
e training framework of PAGNN
Input:
G
and
{
G
1
, . . . ,
G
M
}
Output:
Model parameters
θ
1
Randomly initialize
θ
;
2
for
G
i
=
G
1
, . . . ,
G
M
do
3
Select perturbed edge set
P
i
with
metaack
;
4
end
5
while
not earlystop
do
6
for
G
i
=
G
1
, . . . ,
G
M
do
7
Split labeled nodes of
G
i
into support set
S
i
and
Q
i
;
8
Evaluating
∇
θ
L
T
i
(
θ
)
with
S
i
and
L
T
i
;
9
Compute adapted parameters
θ
′
i
with gradient descent:
θ
′
i
←
θ
−
α
∇
θ
L
T
i
(
θ
)
;
10
end
11
Update
θ
on
{
Q
1
, . . . ,
Q
M
}
with:
θ
←
θ
−
β
∇
θ
M i
=
1
L
T
i
(
θ
′
i
)
;
12
end
13
Finetune
θ
on
G
use
L
c
;
5 EXPERIMENTS
Inthissection,weconductexperimentstoevaluatetheeﬀectiveness
of PAGNN. We aim to answer the following questions:
•
Can PAGNN outperform existing robust GNNs under represen
tative and stateoftheart adversarial aacks on graphs?
•
Howthepenalizedaggregationmechanismandthemetaoptimization
algorithm contribute to PAGNN?
•
How sensitive of PAGNN on the hyperparameters?
Next, we start by introducing the experimental seings followed
by experiments on node classiﬁcation to answer these questions.
5.1 Experimental Setup
5.1.1 Datasets.
To conduct comprehensive studies of PAGNN,
we conduct experiments under two diﬀerent seings:
•
Samedomain seing
: We sample the poisoned graph and cleangraphs from the same data distribution. Two popular benchmark networks (i.e.,
Pubmed
[
28
] and
Reddit
[
13
]) are selectedas large graphs.
Pubmed
is a citation network where nodes aredocuments and edges represent citations;
Reddit
is compiled