Add syscall sandboxing using seccomp-bpf (Linux secure computing mode) #20487

practicalswift · 2020-11-25T09:27:44Z

Add experimental syscall sandboxing using seccomp-bpf (Linux secure computing mode).

Enable filtering of system calls using seccomp-bpf: allow only explicitly allowlisted (expected) syscalls to be called.

The syscall sandboxing implemented in this PR is an experimental feature currently available only under Linux x86-64.

To enable the experimental syscall sandbox the -sandbox=<mode> option must be passed to bitcoind:

  -sandbox=<mode>
       Use the experimental syscall sandbox in the specified mode
       (-sandbox=log-and-abort or -sandbox=abort). Allow only expected
       syscalls to be used by bitcoind. Note that this is an
       experimental new feature that may cause bitcoind to exit or crash
       unexpectedly: use with caution. In the "log-and-abort" mode the
       invocation of an unexpected syscall results in a debug handler
       being invoked which will log the incident and terminate the
       program (without executing the unexpected syscall). In the
       "abort" mode the invocation of an unexpected syscall results in
       the entire process being killed immediately by the kernel without
       executing the unexpected syscall.

The allowed syscalls are defined on a per thread basis.

I've used this feature since summer 2020 and I find it to be a helpful testing/debugging addition which makes it much easier to reason about the actual capabilities required of each type of thread in Bitcoin Core.

Quick start guide:

$ ./configure
$ src/bitcoind -regtest -debug=util -sandbox=log-and-abort
…
2021-06-09T12:34:56Z Experimental syscall sandbox enabled (-sandbox=log-and-abort): bitcoind will terminate if an unexpected (not allowlisted) syscall is invoked.
…
2021-06-09T12:34:56Z Syscall filter installed for thread "addcon"
2021-06-09T12:34:56Z Syscall filter installed for thread "dnsseed"
2021-06-09T12:34:56Z Syscall filter installed for thread "net"
2021-06-09T12:34:56Z Syscall filter installed for thread "msghand"
2021-06-09T12:34:56Z Syscall filter installed for thread "opencon"
2021-06-09T12:34:56Z Syscall filter installed for thread "init"
…
# A simulated execve call to show the sandbox in action:
2021-06-09T12:34:56Z ERROR: The syscall "execve" (syscall number 59) is not allowed by the syscall sandbox in thread "msghand". Please report.
…
Aborted (core dumped)
$

About seccomp and seccomp-bpf:

In computer security, seccomp (short for secure computing mode) is a facility in the Linux kernel. seccomp allows a process to make a one-way transition into a "secure" state where it cannot make any system calls except exit(), sigreturn(), and read() and write() to already-open file descriptors. Should it attempt any other system calls, the kernel will terminate the process with SIGKILL or SIGSYS. In this sense, it does not virtualize the system's resources but isolates the process from them entirely.

[…]

seccomp-bpf is an extension to seccomp that allows filtering of system calls using a configurable policy implemented using Berkeley Packet Filter rules. It is used by OpenSSH and vsftpd as well as the Google Chrome/Chromium web browsers on Chrome OS and Linux. (In this regard seccomp-bpf achieves similar functionality, but with more flexibility and higher performance, to the older systrace—which seems to be no longer supported for Linux.)

laanwj · 2020-11-25T09:29:48Z

Concept ACK, nice work!

DrahtBot · 2020-11-25T15:48:25Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#22956 (validation: log CChainState::CheckBlockIndex() consistency checks by jonatack)
#21943 (Dedup and RAII-fy the creation of a copy of CConnman::vNodes by vasild)
#21878 (Make all networking code mockable by vasild)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

jb55 · 2020-11-25T18:58:15Z

Very cool, Concept ACK

laanwj · 2021-10-04T19:14:22Z

Code review and lightly tested ACK 4747da3

…secure …

maflcko · 2021-10-05T08:14:56Z

src/util/syscall_sandbox.cpp

+    const sock_fprog prog = {
+        .len = static_cast<uint16_t>(filter.size()),
+        .filter = filter.data(),
+    };


https://en.cppreference.com/w/cpp/language/aggregate_initialization

Neither I, nor CI could find a compiler that rejects this C++20 code, so does that mean we are allowed to use it in new code now?

Looks like this is supported since clang 3.2 and gcc 4.7, but maybe not msvc?

See #23183

msvc: error C7555: use of designated initializers requires at least '/std:c++20'

FYI adding -Wpedantic turns this into a warning:
$ g++ -Wall -std=c++17 -pedantic pedantic.cpp -c -o pedantic.o

pedantic.cpp: In function ‘int main()’:
pedantic.cpp:8:13: warning: C++ designated initializers only available with ‘-std=c++2a’ or ‘-std=gnu++2a’ [-Wpedantic]
8 | foo bar{.a = 1};

I actually thought we had -Wpedantic on by default, I'm not sure why we don't. @fanquake @dongcarl is there an obvious reason I'm forgetting?

Edit: Clang also has -Wc++20-designator.

Edit2: pushed a quick branch to demonstrate the changes needed to build bitcoind (only, everything else untested) with -Wpedantic, which is meant to warn about the use of compiler exensions: https://github.com/theuni/bitcoin/tree/build-pedantic

…d filesystem syscalls 44d77d2 sandbox: add copy_file_range to allowed filesystem syscalls (fanquake) ee08741 sandbox: add newfstatat to allowed filesystem syscalls (fanquake) Pull request description: Similar to #23178, this is a follow up to #20487, which has broken running the unit tests for some developers. Fix this by adding `newfstatat` to the list of allowed filesystem related calls. ACKs for top commit: achow101: ACK 44d77d2 laanwj: Code review ACK 44d77d2 practicalswift: cr ACK 44d77d2 Tree-SHA512: ce9d1b441ebf25bd2cf290566e05864223c1418dab315c962e1094ad877db5dd9fcab94ab98a46da8b712a8f5f46675d62ca3349215d8df46ec5b3c4d72dbaa6

… allowed filesystem syscalls 44d77d2 sandbox: add copy_file_range to allowed filesystem syscalls (fanquake) ee08741 sandbox: add newfstatat to allowed filesystem syscalls (fanquake) Pull request description: Similar to bitcoin#23178, this is a follow up to bitcoin#20487, which has broken running the unit tests for some developers. Fix this by adding `newfstatat` to the list of allowed filesystem related calls. ACKs for top commit: achow101: ACK 44d77d2 laanwj: Code review ACK 44d77d2 practicalswift: cr ACK 44d77d2 Tree-SHA512: ce9d1b441ebf25bd2cf290566e05864223c1418dab315c962e1094ad877db5dd9fcab94ab98a46da8b712a8f5f46675d62ca3349215d8df46ec5b3c4d72dbaa6

2d02799 util: Make sure syscall numbers used in profile are defined (W. J. van der Laan) 8289d19 util: Define SECCOMP_RET_KILL_PROCESS if not provided by the headers (W. J. van der Laan) Pull request description: Looks like we've broke the GUIX build in #20487. This attempts to fix it: - Define `__NR_statx` `__NR_getrandom` `__NR_membarrier` as some kernel headers lack them, and it's important to have the same profile independent on what kernel is used for building. - Define `SECCOMP_RET_KILL_PROCESS` as it isn't defined in the headers. ACKs for top commit: practicalswift: cr ACK 2d02799 Tree-SHA512: c264c66f90af76bf364150e44d0a31876c2ef99f05777fcdd098a23f1e80efef43028f54bf9b3dad016110056d303320ed9741b0cb4c6266175fa9d5589b4277

ryanofsky · 2021-10-05T18:14:27Z

Just curious, but merging this seems to have caused a few build problems that required followups: #23178 #23179 #23196, and I guess some test problems in local developer checkouts.

I'm wondering if there are any CI improvements that might have caught these problems. Are too many of the CI builds using old kernels, and could adding CI builds with newer kernels catch these problems? Are too many of the CI builds using depends and could adding CI builds with system dependencies catch these problems? Is there some extended draftbot build that might have caught these problems?

It seems like it could make sense to run some extended CI on PRs that modify configure.ac or build scripts, but not clear what might have been useful in this case.

laanwj · 2021-10-05T18:48:56Z

could adding CI builds with newer kernels catch these problems

For the record, it's not the running kernel version that makes a difference here. But the kernel headers. What would have helped find the problem is compiling with older kernel headers. We also should have done a GUIX build.

It's the first time these matter at all for our compilation. Besides syscall(SYS_getrandom… for random context gathering, there is no code that directly interfaces with the Linux kernel at all.

I think it's a good lesson that we need to make sure to define everything we need from the kernel ourselves, to support compilation on a wide range of installations. I knew about this problem but underestimated how much it's still a problem nowadays.

ryanofsky · 2021-10-05T19:20:37Z

I just don't understand the basics here. What CI build could you theoretically add to catch these problems? Would it be a CI build with a really old kernel and really old headers? A really new kernel and new headers? An old kernel with new headers? And new kernel with old headers?

2d02799 util: Make sure syscall numbers used in profile are defined (W. J. van der Laan) 8289d19 util: Define SECCOMP_RET_KILL_PROCESS if not provided by the headers (W. J. van der Laan) Pull request description: Looks like we've broke the GUIX build in bitcoin#20487. This attempts to fix it: - Define `__NR_statx` `__NR_getrandom` `__NR_membarrier` as some kernel headers lack them, and it's important to have the same profile independent on what kernel is used for building. - Define `SECCOMP_RET_KILL_PROCESS` as it isn't defined in the headers. ACKs for top commit: practicalswift: cr ACK 2d02799 Tree-SHA512: c264c66f90af76bf364150e44d0a31876c2ef99f05777fcdd098a23f1e80efef43028f54bf9b3dad016110056d303320ed9741b0cb4c6266175fa9d5589b4277

maflcko · 2021-10-06T09:10:20Z

Are too many of the CI builds using old kernels, and could adding CI builds with newer kernels catch these problems?

Generally developers tend to run the latest software, so this is usually not something that will catch a lot of fish as a CI task. In fact, CI tasks with bleeding edge software might tend to break more often. (I do run them, but not part of this project).

Is there some extended draftbot build that might have caught these problems?

Simply adding the "DrahtBot Guix build requested" label would have caught this. I think any pull request that is tagged with "Build system" should be Guix-built as well. As DrahtBot is running on limited resources, it might take a day or two to create the guix build, but I think this is an acceptable delay. If there is an "emergency-fix" devs can always do a local guix build.

maflcko · 2021-10-06T10:06:47Z

And when it comes to preventing build errors via the CI, I think this is not something we should try to achieve at all costs. Due to compiler bugs (or just software bugs in general), there will always be at least one software configuration and build flag configuration that will fail to build. I think it is reasonable to have tests for common software configurations on the latest LTS releases of operating systems. Though, when it comes to supporting any software configuration, waiting for an actual user to report the issue instead of integrating into the CI works probably better.

ryanofsky · 2021-10-06T12:29:25Z

Thanks for the replies. So is the answer to my question of what CI build would catch all these problems (#23178 #23179 #23196) that any CI build using newer kernel headers would catch all these problems? That all CI builds are using older headers and practicalswift was also using older headers, and this slipped past review?

My motivation for asking is just to understand when I am reviewing a PR what class of errors I could expect CI to catch, and what class of errors I should be testing manually or asking the PR author about. In this case, I guess I failed to ask practicalswift how is this code affected when new syscalls are added? And in the future, when I'm reviewing a PR that makes significant changes to the build system, I can request GUIX builds?

I don't really have opinions on what CI should check for. I just want to understand what CI is checking for so I can review PRs better. For example, it is useful to know if CI is only checking old kernels and we are depending on developers to manually test newer kernels. It is also useful to know more generally if CI builds tend to use older or different dependencies than actual the GUIX release.

maflcko · 2021-10-06T12:41:01Z

I can request GUIX builds?

Sure, just add the label.

It is also useful to know more generally if CI builds tend to use older or different dependencies than actual the GUIX release.

CI is a mix of cross-compile depends builds that try to mimic guix (older LTS releases with older gcc-8) and recent clang with native sanitizer builds from system packages (except for msan, which is also using depends). Obviously Ubuntu/Debian ship different compilers than guix, so the CI is just a proxy and not a replacement for a guix build.

Sjors · 2021-11-15T10:44:19Z

test/functional/test_framework/test_framework.py

@@ -468,6 +471,10 @@ def get_bin_from_version(version, bin_name, bin_default):
            extra_args = [[]] * num_nodes
        if versions is None:
            versions = [None] * num_nodes
+        if self.is_syscall_sandbox_compiled() and not self.disable_syscall_sandbox:
+            for i in range(len(extra_args)):
+                if versions[i] is None or versions[i] >= 219900:


This is actually not in the v22.0 binary and also not backported (yet). I made a commit to bump the version: 4672c1f

practicalswift force-pushed the seccomp-bpf-2020-11 branch from fb9e303 to 2cba28c Compare November 25, 2020 11:01

DrahtBot added Build system GUI P2P RPC/REST/ZMQ Utils/log/libs UTXO Db and Indexes Validation labels Nov 25, 2020

laanwj removed GUI P2P RPC/REST/ZMQ UTXO Db and Indexes Validation labels Nov 25, 2020

DrahtBot mentioned this pull request Nov 25, 2020

Disable and fix tests for when BDB is not compiled #20267

Merged

practicalswift force-pushed the seccomp-bpf-2020-11 branch from 2cba28c to 54df39b Compare November 25, 2020 19:53

practicalswift force-pushed the seccomp-bpf-2020-11 branch from 4343f1f to 4747af3 Compare October 1, 2021 13:42

Add syscall sandboxing (seccomp-bpf)

4747da3

practicalswift force-pushed the seccomp-bpf-2020-11 branch from 4747af3 to 4747da3 Compare October 1, 2021 13:54

laanwj merged commit 9e530c6 into bitcoin:master Oct 4, 2021

This was referenced Oct 4, 2021

Other platform support for Linux syscall sandboxing #23177

Closed

util: Fix GUIX build with syscall sandbox #23178

Merged

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Oct 4, 2021

Merge bitcoin#20487: Add syscall sandboxing using seccomp-bpf (Linux …

48e1f25

…secure …

fanquake mentioned this pull request Oct 5, 2021

sandbox: add newfstatat & copy_file_range to allowed filesystem syscalls #23179

Merged

maflcko reviewed Oct 5, 2021

View reviewed changes

laanwj mentioned this pull request Oct 5, 2021

util: Make syscall sandbox compilable with kernel 4.4.0 #23196

Merged

maflcko added DrahtBot Guix build requested and removed DrahtBot Guix build requested labels Oct 6, 2021

maflcko mentioned this pull request Oct 11, 2021

Bitcoin Core crashes under clang-13 libc++ with "Bad system call (core dumped)" #23248

Closed

practicalswift mentioned this pull request Oct 11, 2021

util: Add mincore and clone3 to syscall sandbox #23255

Merged

maflcko mentioned this pull request Oct 26, 2021

Discussion: Upgrading to C++20 #23363

Closed

Sjors reviewed Nov 15, 2021

View reviewed changes

laanwj mentioned this pull request Apr 5, 2022

Outcome of the syscall sandbox experiment #24771

Closed

bitcoin locked and limited conversation to collaborators Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add syscall sandboxing using seccomp-bpf (Linux secure computing mode) #20487

Add syscall sandboxing using seccomp-bpf (Linux secure computing mode) #20487

practicalswift commented Nov 25, 2020 •

edited

laanwj commented Nov 25, 2020

DrahtBot commented Nov 25, 2020 •

edited

jb55 commented Nov 25, 2020

laanwj commented Oct 4, 2021

maflcko Oct 5, 2021

maflcko Oct 5, 2021 •

edited

theuni Oct 25, 2021 •

edited

ryanofsky commented Oct 5, 2021

laanwj commented Oct 5, 2021 •

edited

ryanofsky commented Oct 5, 2021

maflcko commented Oct 6, 2021 •

edited

maflcko commented Oct 6, 2021

ryanofsky commented Oct 6, 2021

maflcko commented Oct 6, 2021

Sjors Nov 15, 2021

Add syscall sandboxing using seccomp-bpf (Linux secure computing mode) #20487

Add syscall sandboxing using seccomp-bpf (Linux secure computing mode) #20487

Conversation

practicalswift commented Nov 25, 2020 • edited

laanwj commented Nov 25, 2020

DrahtBot commented Nov 25, 2020 • edited

Conflicts

jb55 commented Nov 25, 2020

laanwj commented Oct 4, 2021

maflcko Oct 5, 2021

Choose a reason for hiding this comment

maflcko Oct 5, 2021 • edited

Choose a reason for hiding this comment

theuni Oct 25, 2021 • edited

Choose a reason for hiding this comment

ryanofsky commented Oct 5, 2021

laanwj commented Oct 5, 2021 • edited

ryanofsky commented Oct 5, 2021

maflcko commented Oct 6, 2021 • edited

maflcko commented Oct 6, 2021

ryanofsky commented Oct 6, 2021

maflcko commented Oct 6, 2021

Sjors Nov 15, 2021

Choose a reason for hiding this comment

practicalswift commented Nov 25, 2020 •

edited

DrahtBot commented Nov 25, 2020 •

edited

maflcko Oct 5, 2021 •

edited

theuni Oct 25, 2021 •

edited

laanwj commented Oct 5, 2021 •

edited

maflcko commented Oct 6, 2021 •

edited