triton: fix context schedule/wakeup race

Allow triton_context_wakeup() to run before triton_context_schedule(). When that happens, triton_context_schedule() now lets the context running instead of putting it in sleep mode. Note that, even though triton now allows triton_context_wakeup() to happen before triton_context_schedule(), these two functions still need to be paired and not nested. That is, in a sequence like the following, triton_context_wakeup() triton_context_wakeup() triton_context_schedule() triton_context_schedule() the second triton_context_schedule() would put the context in sleep mode. No matter how many triton_context_wakeup() have been called, the first triton_context_schedule() "consumes" them all. Being immune to schedule/wakeup inversion allows to fix the pppd_compat module. This module needs to fork() to execute external programs. The parent then waits for completion of its child using triton_context_schedule(). When child terminates, the sigchld module runs a callback that has to call triton_context_wakeup() to resume execution of the parent. The problem is that there is no synchronisation between the parent and its child. When under stress, the child may execute faster than its parent and the sigchld callback might run triton_context_wakeup() before the parent had time to call triton_context_schedule(). Then accel-ppp might crash because the triton thread might have reset ctx->thread to NULL, making triton_context_wakeup() write to invalid memory when trying to insert the context in ctx->thread->wakeup_list[]. Synchronising the parent and its child completion's callback would require cooperation from triton_context_schedule(). Otherwise we would still have a time frame between the moment we let the callback waking up the context and the moment we put the context in sleep mode. Allowing schedule/wakeup call inversion in triton looks simpler since it avoids modifying the current API. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
author: Guillaume Nault <g.nault@alphalink.fr> 2018-10-22 12:00:00 +0200
committer: Dmitry Kozlov <xeb@mail.ru> 2018-11-03 09:00:11 +0300
commit: c709a12656b64f33cd553b7d01842381d8fdce7a (patch)
tree: 789c0d59e1316cee0660cfa5ced818f5f6a2f54c /accel-pppd/triton
parent: be2b604a01c22780673a77675c268354dd33995e (diff)
download: accel-ppp-c709a12656b64f33cd553b7d01842381d8fdce7a.tar.gz
accel-ppp-c709a12656b64f33cd553b7d01842381d8fdce7a.zip
2 files changed, 16 insertions, 3 deletions
diff --git a/accel-pppd/triton/triton.c b/accel-pppd/triton/triton.c
index d5bd01d3..3ea72f02 100644
--- a/accel-pppd/triton/triton.c
+++ b/accel-pppd/triton/triton.c
@@ -125,7 +125,7 @@ static void* triton_thread(struct _triton_thread_t *thread)
 	while (1) {
 		spin_lock(&threads_lock);
 		if (!need_config_reload && check_ctx_queue_empty(thread)) {
-			if (thread->ctx->wakeup) {
+			if (thread->ctx->asleep && thread->ctx->wakeup) {
 				log_debug2("thread: %p: wakeup ctx %p\n", thread, thread->ctx);
 				list_del(&thread->ctx->entry2);
 				spin_unlock(&threads_lock);
@@ -503,6 +503,7 @@ void __export triton_context_schedule()
 
 	spin_lock(&threads_lock);
 	if (ctx->wakeup) {
+		ctx->asleep = 0;
 		ctx->wakeup = 0;
 		spin_unlock(&threads_lock);
 		_free(ctx->uc);
@@ -510,6 +511,7 @@ void __export triton_context_schedule()
 		__sync_sub_and_fetch(&triton_stat.context_sleeping, 1);
 		log_debug2("ctx %p: exit schedule\n", ctx);
 	} else {
+		ctx->asleep = 1;
 		ctx->thread->ctx = NULL;
 		spin_unlock(&threads_lock);
 		longjmp(jmp_env, 1);
@@ -532,9 +534,19 @@ void __export triton_context_wakeup(struct triton_context_t *ud)
 		spin_unlock(&ctx->lock);
 	} else {
 		spin_lock(&threads_lock);
+		/* In some cases (pppd_compat.c), triton_context_wakeup() might
+		 * be called before triton_context_schedule(). When that
+		 * happens, we must not add 'ctx' to the wakeup_list as it is
+		 * still awake. However we need to set the 'wakeup' flag. This
+		 * way, when triton_context_schedule() will run, it will
+		 * realise that triton_context_wakeup() was already executed
+		 * and will avoid putting 'ctx' in sleep mode.
+		 */
 		ctx->wakeup = 1;
-		list_add_tail(&ctx->entry2, &ctx->thread->wakeup_list[ctx->priority]);
-		r = ctx->thread->ctx == NULL;
+		if (ctx->asleep) {
+			list_add_tail(&ctx->entry2, &ctx->thread->wakeup_list[ctx->priority]);
+			r = ctx->thread->ctx == NULL;
+		}
 		spin_unlock(&threads_lock);
 	}
 
diff --git a/accel-pppd/triton/triton_p.h b/accel-pppd/triton/triton_p.h
index fe735bf6..a727e62e 100644
--- a/accel-pppd/triton/triton_p.h
+++ b/accel-pppd/triton/triton_p.h
@@ -40,6 +40,7 @@ struct _triton_context_t
 	int init;
 	int queued;
 	int wakeup;
+	int asleep;
 	int need_close;
 	int need_free;
 	int pending;
author	Guillaume Nault <g.nault@alphalink.fr>	2018-10-22 12:00:00 +0200
committer	Dmitry Kozlov <xeb@mail.ru>	2018-11-03 09:00:11 +0300
commit	c709a12656b64f33cd553b7d01842381d8fdce7a (patch)
tree	789c0d59e1316cee0660cfa5ced818f5f6a2f54c /accel-pppd/triton
parent	be2b604a01c22780673a77675c268354dd33995e (diff)
download	accel-ppp-c709a12656b64f33cd553b7d01842381d8fdce7a.tar.gz accel-ppp-c709a12656b64f33cd553b7d01842381d8fdce7a.zip