Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add syscall sandboxing using seccomp-bpf (Linux secure computing mode) #20487

Merged
merged 1 commit into from
Oct 4, 2021

Conversation

practicalswift
Copy link
Contributor

@practicalswift practicalswift commented Nov 25, 2020

Add experimental syscall sandboxing using seccomp-bpf (Linux secure computing mode).

Enable filtering of system calls using seccomp-bpf: allow only explicitly allowlisted (expected) syscalls to be called.

The syscall sandboxing implemented in this PR is an experimental feature currently available only under Linux x86-64.

To enable the experimental syscall sandbox the -sandbox=<mode> option must be passed to bitcoind:

  -sandbox=<mode>
       Use the experimental syscall sandbox in the specified mode
       (-sandbox=log-and-abort or -sandbox=abort). Allow only expected
       syscalls to be used by bitcoind. Note that this is an
       experimental new feature that may cause bitcoind to exit or crash
       unexpectedly: use with caution. In the "log-and-abort" mode the
       invocation of an unexpected syscall results in a debug handler
       being invoked which will log the incident and terminate the
       program (without executing the unexpected syscall). In the
       "abort" mode the invocation of an unexpected syscall results in
       the entire process being killed immediately by the kernel without
       executing the unexpected syscall.

The allowed syscalls are defined on a per thread basis.

I've used this feature since summer 2020 and I find it to be a helpful testing/debugging addition which makes it much easier to reason about the actual capabilities required of each type of thread in Bitcoin Core.


Quick start guide:

$ ./configure
$ src/bitcoind -regtest -debug=util -sandbox=log-and-abort
…
2021-06-09T12:34:56Z Experimental syscall sandbox enabled (-sandbox=log-and-abort): bitcoind will terminate if an unexpected (not allowlisted) syscall is invoked.
…
2021-06-09T12:34:56Z Syscall filter installed for thread "addcon"
2021-06-09T12:34:56Z Syscall filter installed for thread "dnsseed"
2021-06-09T12:34:56Z Syscall filter installed for thread "net"
2021-06-09T12:34:56Z Syscall filter installed for thread "msghand"
2021-06-09T12:34:56Z Syscall filter installed for thread "opencon"
2021-06-09T12:34:56Z Syscall filter installed for thread "init"
…
# A simulated execve call to show the sandbox in action:
2021-06-09T12:34:56Z ERROR: The syscall "execve" (syscall number 59) is not allowed by the syscall sandbox in thread "msghand". Please report.
…
Aborted (core dumped)
$

About seccomp and seccomp-bpf:

In computer security, seccomp (short for secure computing mode) is a facility in the Linux kernel. seccomp allows a process to make a one-way transition into a "secure" state where it cannot make any system calls except exit(), sigreturn(), and read() and write() to already-open file descriptors. Should it attempt any other system calls, the kernel will terminate the process with SIGKILL or SIGSYS. In this sense, it does not virtualize the system's resources but isolates the process from them entirely.

[…]

seccomp-bpf is an extension to seccomp that allows filtering of system calls using a configurable policy implemented using Berkeley Packet Filter rules. It is used by OpenSSH and vsftpd as well as the Google Chrome/Chromium web browsers on Chrome OS and Linux. (In this regard seccomp-bpf achieves similar functionality, but with more flexibility and higher performance, to the older systrace—which seems to be no longer supported for Linux.)

@laanwj
Copy link
Member

laanwj commented Nov 25, 2020

Concept ACK, nice work!

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 25, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #22956 (validation: log CChainState::CheckBlockIndex() consistency checks by jonatack)
  • #21943 (Dedup and RAII-fy the creation of a copy of CConnman::vNodes by vasild)
  • #21878 (Make all networking code mockable by vasild)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jb55
Copy link
Contributor

jb55 commented Nov 25, 2020

Very cool, Concept ACK

@laanwj
Copy link
Member

laanwj commented Oct 4, 2021

Code review and lightly tested ACK 4747da3

const sock_fprog prog = {
.len = static_cast<uint16_t>(filter.size()),
.filter = filter.data(),
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.cppreference.com/w/cpp/language/aggregate_initialization

Neither I, nor CI could find a compiler that rejects this C++20 code, so does that mean we are allowed to use it in new code now?

Copy link
Member

@maflcko maflcko Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is supported since clang 3.2 and gcc 4.7, but maybe not msvc?

See #23183

msvc: error C7555: use of designated initializers requires at least '/std:c++20'

Copy link
Member

@theuni theuni Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI adding -Wpedantic turns this into a warning:
$ g++ -Wall -std=c++17 -pedantic pedantic.cpp -c -o pedantic.o

pedantic.cpp: In function ‘int main()’:
pedantic.cpp:8:13: warning: C++ designated initializers only available with ‘-std=c++2a’ or ‘-std=gnu++2a’ [-Wpedantic]
8 | foo bar{.a = 1};

I actually thought we had -Wpedantic on by default, I'm not sure why we don't. @fanquake @dongcarl is there an obvious reason I'm forgetting?

Edit: Clang also has -Wc++20-designator.

Edit2: pushed a quick branch to demonstrate the changes needed to build bitcoind (only, everything else untested) with -Wpedantic, which is meant to warn about the use of compiler exensions: https://github.com/theuni/bitcoin/tree/build-pedantic

maflcko pushed a commit that referenced this pull request Oct 5, 2021
…d filesystem syscalls

44d77d2 sandbox: add copy_file_range to allowed filesystem syscalls (fanquake)
ee08741 sandbox: add newfstatat to allowed filesystem syscalls (fanquake)

Pull request description:

  Similar to #23178, this is a follow up to #20487, which has broken running the unit tests for some developers. Fix this by adding `newfstatat` to the list of allowed filesystem related calls.

ACKs for top commit:
  achow101:
    ACK 44d77d2
  laanwj:
    Code review ACK  44d77d2
  practicalswift:
    cr ACK 44d77d2

Tree-SHA512: ce9d1b441ebf25bd2cf290566e05864223c1418dab315c962e1094ad877db5dd9fcab94ab98a46da8b712a8f5f46675d62ca3349215d8df46ec5b3c4d72dbaa6
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Oct 5, 2021
… allowed filesystem syscalls

44d77d2 sandbox: add copy_file_range to allowed filesystem syscalls (fanquake)
ee08741 sandbox: add newfstatat to allowed filesystem syscalls (fanquake)

Pull request description:

  Similar to bitcoin#23178, this is a follow up to bitcoin#20487, which has broken running the unit tests for some developers. Fix this by adding `newfstatat` to the list of allowed filesystem related calls.

ACKs for top commit:
  achow101:
    ACK 44d77d2
  laanwj:
    Code review ACK  44d77d2
  practicalswift:
    cr ACK 44d77d2

Tree-SHA512: ce9d1b441ebf25bd2cf290566e05864223c1418dab315c962e1094ad877db5dd9fcab94ab98a46da8b712a8f5f46675d62ca3349215d8df46ec5b3c4d72dbaa6
laanwj added a commit that referenced this pull request Oct 5, 2021
2d02799 util: Make sure syscall numbers used in profile are defined (W. J. van der Laan)
8289d19 util: Define SECCOMP_RET_KILL_PROCESS if not provided by the headers (W. J. van der Laan)

Pull request description:

  Looks like we've broke the GUIX build in #20487. This attempts to fix it:

  - Define `__NR_statx` `__NR_getrandom` `__NR_membarrier` as some kernel headers lack them, and it's important to have the same profile independent on what kernel is used for building.
  - Define `SECCOMP_RET_KILL_PROCESS` as it isn't defined in the headers.

ACKs for top commit:
  practicalswift:
    cr ACK 2d02799

Tree-SHA512: c264c66f90af76bf364150e44d0a31876c2ef99f05777fcdd098a23f1e80efef43028f54bf9b3dad016110056d303320ed9741b0cb4c6266175fa9d5589b4277
@ryanofsky
Copy link
Contributor

Just curious, but merging this seems to have caused a few build problems that required followups: #23178 #23179 #23196, and I guess some test problems in local developer checkouts.

I'm wondering if there are any CI improvements that might have caught these problems. Are too many of the CI builds using old kernels, and could adding CI builds with newer kernels catch these problems? Are too many of the CI builds using depends and could adding CI builds with system dependencies catch these problems? Is there some extended draftbot build that might have caught these problems?

It seems like it could make sense to run some extended CI on PRs that modify configure.ac or build scripts, but not clear what might have been useful in this case.

@laanwj
Copy link
Member

laanwj commented Oct 5, 2021

could adding CI builds with newer kernels catch these problems

For the record, it's not the running kernel version that makes a difference here. But the kernel headers. What would have helped find the problem is compiling with older kernel headers. We also should have done a GUIX build.

It's the first time these matter at all for our compilation. Besides syscall(SYS_getrandom… for random context gathering, there is no code that directly interfaces with the Linux kernel at all.

I think it's a good lesson that we need to make sure to define everything we need from the kernel ourselves, to support compilation on a wide range of installations. I knew about this problem but underestimated how much it's still a problem nowadays.

@ryanofsky
Copy link
Contributor

I just don't understand the basics here. What CI build could you theoretically add to catch these problems? Would it be a CI build with a really old kernel and really old headers? A really new kernel and new headers? An old kernel with new headers? And new kernel with old headers?

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Oct 5, 2021
2d02799 util: Make sure syscall numbers used in profile are defined (W. J. van der Laan)
8289d19 util: Define SECCOMP_RET_KILL_PROCESS if not provided by the headers (W. J. van der Laan)

Pull request description:

  Looks like we've broke the GUIX build in bitcoin#20487. This attempts to fix it:

  - Define `__NR_statx` `__NR_getrandom` `__NR_membarrier` as some kernel headers lack them, and it's important to have the same profile independent on what kernel is used for building.
  - Define `SECCOMP_RET_KILL_PROCESS` as it isn't defined in the headers.

ACKs for top commit:
  practicalswift:
    cr ACK 2d02799

Tree-SHA512: c264c66f90af76bf364150e44d0a31876c2ef99f05777fcdd098a23f1e80efef43028f54bf9b3dad016110056d303320ed9741b0cb4c6266175fa9d5589b4277
@maflcko
Copy link
Member

maflcko commented Oct 6, 2021

Are too many of the CI builds using old kernels, and could adding CI builds with newer kernels catch these problems?

Generally developers tend to run the latest software, so this is usually not something that will catch a lot of fish as a CI task. In fact, CI tasks with bleeding edge software might tend to break more often. (I do run them, but not part of this project).

Is there some extended draftbot build that might have caught these problems?

Simply adding the "DrahtBot Guix build requested" label would have caught this. I think any pull request that is tagged with "Build system" should be Guix-built as well. As DrahtBot is running on limited resources, it might take a day or two to create the guix build, but I think this is an acceptable delay. If there is an "emergency-fix" devs can always do a local guix build.

@maflcko
Copy link
Member

maflcko commented Oct 6, 2021

And when it comes to preventing build errors via the CI, I think this is not something we should try to achieve at all costs. Due to compiler bugs (or just software bugs in general), there will always be at least one software configuration and build flag configuration that will fail to build. I think it is reasonable to have tests for common software configurations on the latest LTS releases of operating systems. Though, when it comes to supporting any software configuration, waiting for an actual user to report the issue instead of integrating into the CI works probably better.

@ryanofsky
Copy link
Contributor

Thanks for the replies. So is the answer to my question of what CI build would catch all these problems (#23178 #23179 #23196) that any CI build using newer kernel headers would catch all these problems? That all CI builds are using older headers and practicalswift was also using older headers, and this slipped past review?

My motivation for asking is just to understand when I am reviewing a PR what class of errors I could expect CI to catch, and what class of errors I should be testing manually or asking the PR author about. In this case, I guess I failed to ask practicalswift how is this code affected when new syscalls are added? And in the future, when I'm reviewing a PR that makes significant changes to the build system, I can request GUIX builds?

I don't really have opinions on what CI should check for. I just want to understand what CI is checking for so I can review PRs better. For example, it is useful to know if CI is only checking old kernels and we are depending on developers to manually test newer kernels. It is also useful to know more generally if CI builds tend to use older or different dependencies than actual the GUIX release.

@maflcko
Copy link
Member

maflcko commented Oct 6, 2021

I can request GUIX builds?

Sure, just add the label.

It is also useful to know more generally if CI builds tend to use older or different dependencies than actual the GUIX release.

CI is a mix of cross-compile depends builds that try to mimic guix (older LTS releases with older gcc-8) and recent clang with native sanitizer builds from system packages (except for msan, which is also using depends). Obviously Ubuntu/Debian ship different compilers than guix, so the CI is just a proxy and not a replacement for a guix build.

@@ -468,6 +471,10 @@ def get_bin_from_version(version, bin_name, bin_default):
extra_args = [[]] * num_nodes
if versions is None:
versions = [None] * num_nodes
if self.is_syscall_sandbox_compiled() and not self.disable_syscall_sandbox:
for i in range(len(extra_args)):
if versions[i] is None or versions[i] >= 219900:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not in the v22.0 binary and also not backported (yet). I made a commit to bump the version: 4672c1f

@bitcoin bitcoin locked and limited conversation to collaborators Nov 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet