Skip to content

Commit 6f43aea

Browse files
EnricoMipitrou
andauthored
GH-31603: [C++] Add SecureString implementation to arrow/util/ (#46626)
### Rationale for this change Arrow deals with secrets like encryption / decryption keys which must be kept private. One way of leaking such secrets is through memory allocation where another process allocates memory that previously hold the secret, because that memory was not cleared before being freed. ### What changes are included in this PR? Uses various implementations of securely clearing memory, notably - `SecureZeroMemory`(Windows) - `memset_s`(STDC) - `OPENSSL_cleanse` (OpenSSL >= 3) - `explicit_bzero`(glibc 2.25+) - volatile `memset` (fallback). ### Are these changes tested? Unit tests. ### Are there any user-facing changes? This only adds the `SecureString` class and tests. Using this new infrastructure is done in follow-up pull requests. * GitHub Issue: #31603 Lead-authored-by: Enrico Minack <github@enrico.minack.dev> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>
1 parent 5a4c8b6 commit 6f43aea

5 files changed

Lines changed: 775 additions & 0 deletions

File tree

cpp/src/arrow/CMakeLists.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -515,6 +515,7 @@ set(ARROW_UTIL_SRCS
515515
util/memory.cc
516516
util/mutex.cc
517517
util/ree_util.cc
518+
util/secure_string.cc
518519
util/string.cc
519520
util/string_builder.cc
520521
util/task_group.cc
@@ -574,6 +575,11 @@ if(ARROW_USE_GLOG)
574575
target_link_libraries(${ARROW_UTIL_TARGET} PRIVATE glog::glog)
575576
endforeach()
576577
endif()
578+
if(ARROW_USE_OPENSSL)
579+
foreach(ARROW_UTIL_TARGET ${ARROW_UTIL_TARGETS})
580+
target_link_libraries(${ARROW_UTIL_TARGET} PRIVATE ${ARROW_OPENSSL_LIBS})
581+
endforeach()
582+
endif()
577583
if(ARROW_USE_XSIMD)
578584
foreach(ARROW_UTIL_TARGET ${ARROW_UTIL_TARGETS})
579585
target_link_libraries(${ARROW_UTIL_TARGET} PRIVATE ${ARROW_XSIMD})

cpp/src/arrow/util/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ add_arrow_test(utility-test
7272
ree_util_test.cc
7373
reflection_test.cc
7474
rows_to_batches_test.cc
75+
secure_string_test.cc
7576
small_vector_test.cc
7677
span_test.cc
7778
stl_util_test.cc
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
// Licensed to the Apache Software Foundation (ASF) under one
2+
// or more contributor license agreements. See the NOTICE file
3+
// distributed with this work for additional information
4+
// regarding copyright ownership. The ASF licenses this file
5+
// to you under the Apache License, Version 2.0 (the
6+
// "License"); you may not use this file except in compliance
7+
// with the License. You may obtain a copy of the License at
8+
//
9+
// http://www.apache.org/licenses/LICENSE-2.0
10+
//
11+
// Unless required by applicable law or agreed to in writing,
12+
// software distributed under the License is distributed on an
13+
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
// KIND, either express or implied. See the License for the
15+
// specific language governing permissions and limitations
16+
// under the License.
17+
18+
// __STDC_WANT_LIB_EXT1__ and string.h are required by memset_s:
19+
// https://en.cppreference.com/w/c/string/byte/memset
20+
#define __STDC_WANT_LIB_EXT1__ 1
21+
#include <string.h>
22+
#include <utility>
23+
24+
#if defined(ARROW_USE_OPENSSL)
25+
# include <openssl/crypto.h>
26+
# include <openssl/opensslv.h>
27+
#endif
28+
29+
#include "arrow/util/windows_compatibility.h"
30+
#if defined(_WIN32)
31+
# include <windows.h>
32+
#endif
33+
34+
#include "arrow/util/logging.h"
35+
#include "arrow/util/secure_string.h"
36+
#include "arrow/util/span.h"
37+
38+
namespace arrow::util {
39+
40+
/// Note:
41+
/// A std::string is securely moved into a SecureString in two steps:
42+
/// 1. the std::string is moved via std::move(string)
43+
/// 2. the std::string is securely cleared
44+
///
45+
/// The std::move has two different effects, depending on the size of the string.
46+
/// A very short string (called local string) stores the string in a local buffer,
47+
/// a long string stores a pointer to allocated memory that stores the string.
48+
///
49+
/// If the string is a small string, std::move copies the local buffer.
50+
/// If the string is a long string, std::move moves the pointer and then resets the
51+
/// string size to 0 (which turns the string into a local string).
52+
///
53+
/// In both cases, after a std::move(string), the string uses the local buffer.
54+
///
55+
/// Thus, after a std::move(string), calling SecureClear(std::string*) only
56+
/// securely clears the **local buffer** of the string. Therefore, std::move(string)
57+
/// must move the pointer of long string into SecureString (which later clears the
58+
/// string). Otherwise, the content of the string cannot be securely cleared.
59+
///
60+
/// This condition is checked by SecureMove.
61+
62+
namespace {
63+
void SecureMove(std::string& string, std::string& dst) {
64+
auto ptr = string.data();
65+
dst = std::move(string);
66+
67+
// We require the buffer address string.data() to remain (not be freed) as is,
68+
// or to be reused by dst. Otherwise, we cannot securely clear string after std::move
69+
ARROW_CHECK(string.data() == ptr || dst.data() == ptr);
70+
}
71+
} // namespace
72+
73+
void SecureString::SecureClear(std::string* secret) {
74+
// call SecureClear first just in case secret->clear() frees some memory
75+
SecureClear(reinterpret_cast<uint8_t*>(secret->data()), secret->capacity());
76+
secret->clear();
77+
}
78+
79+
inline void SecureString::SecureClear(uint8_t* data, size_t size) {
80+
// There is various prior art for this:
81+
// https://www.cryptologie.net/article/419/zeroing-memory-compiler-optimizations-and-memset_s/
82+
// - libb2's `secure_zero_memory` at
83+
// https://github.com/BLAKE2/libb2/blob/30d45a17c59dc7dbf853da3085b71d466275bd0a/src/blake2-impl.h#L140-L160
84+
// - libsodium's `sodium_memzero` at
85+
// https://github.com/jedisct1/libsodium/blob/be58b2e6664389d9c7993b55291402934b43b3ca/src/libsodium/sodium/utils.c#L78:L101
86+
// Note:
87+
// https://www.daemonology.net/blog/2014-09-06-zeroing-buffers-is-insufficient.html
88+
#if defined(_WIN32)
89+
// SecureZeroMemory is meant to not be optimized away
90+
SecureZeroMemory(data, size);
91+
#elif defined(__STDC_LIB_EXT1__)
92+
// memset_s is meant to not be optimized away
93+
memset_s(data, size, 0, size);
94+
#elif defined(OPENSSL_VERSION_NUMBER) && OPENSSL_VERSION_NUMBER >= 0x30000000
95+
// rely on some implementation in OpenSSL cryptographic library
96+
OPENSSL_cleanse(data, size);
97+
#elif defined(__GLIBC__) && (__GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 25))
98+
// explicit_bzero is meant to not be optimized away
99+
explicit_bzero(data, size);
100+
#else
101+
// Volatile pointer to memset function is an attempt to avoid
102+
// that the compiler optimizes away the memset function call.
103+
// pretty much what OPENSSL_cleanse above does
104+
// https://github.com/openssl/openssl/blob/3423c30db3aa044f46e1f0270e2ecd899415bf5f/crypto/mem_clr.c#L22
105+
static const volatile auto memset_v = &memset;
106+
memset_v(data, 0, size);
107+
108+
# if defined(__GNUC__) || defined(__clang__)
109+
// __asm__ only supported by GCC and Clang
110+
// not supported by MSVC on the ARM and x64 processors
111+
// https://en.cppreference.com/w/c/language/asm.html
112+
// https://en.cppreference.com/w/cpp/language/asm.html
113+
114+
// Additional attempt on top of volatile memset_v above
115+
// to avoid that the compiler optimizes away the memset function call.
116+
// Assembler code that tells the compiler 'data' has side effects.
117+
// https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html:
118+
// - "volatile": the asm produces side effects
119+
// - "memory": effectively forms a read/write memory barrier for the compiler
120+
__asm__ __volatile__("" /* no actual code */
121+
: /* no output */
122+
: "r"(data) /* input */
123+
: "memory" /* memory side effects beyond input and output */);
124+
# endif
125+
#endif
126+
}
127+
128+
SecureString::SecureString(SecureString&& other) noexcept {
129+
SecureMove(other.secret_, secret_);
130+
other.Dispose();
131+
}
132+
133+
SecureString::SecureString(std::string&& secret) noexcept {
134+
SecureMove(secret, secret_);
135+
SecureClear(&secret);
136+
}
137+
138+
SecureString::SecureString(size_t n, char c) noexcept : secret_(n, c) {}
139+
140+
SecureString& SecureString::operator=(SecureString&& other) noexcept {
141+
if (this == &other) {
142+
// self-assignment
143+
return *this;
144+
}
145+
Dispose();
146+
SecureMove(other.secret_, secret_);
147+
other.Dispose();
148+
return *this;
149+
}
150+
151+
SecureString& SecureString::operator=(const SecureString& other) {
152+
if (this == &other) {
153+
// self-assignment
154+
return *this;
155+
}
156+
Dispose();
157+
secret_ = other.secret_;
158+
return *this;
159+
}
160+
161+
SecureString& SecureString::operator=(std::string&& secret) noexcept {
162+
Dispose();
163+
SecureMove(secret, secret_);
164+
SecureClear(&secret);
165+
return *this;
166+
}
167+
168+
bool SecureString::operator==(const SecureString& other) const {
169+
return secret_ == other.secret_;
170+
}
171+
172+
bool SecureString::operator!=(const SecureString& other) const {
173+
return secret_ != other.secret_;
174+
}
175+
176+
bool SecureString::empty() const { return secret_.empty(); }
177+
178+
std::size_t SecureString::size() const { return secret_.size(); }
179+
180+
std::size_t SecureString::length() const { return secret_.length(); }
181+
182+
std::size_t SecureString::capacity() const { return secret_.capacity(); }
183+
184+
span<uint8_t> SecureString::as_span() {
185+
return {reinterpret_cast<uint8_t*>(secret_.data()), secret_.size()};
186+
}
187+
188+
span<const uint8_t> SecureString::as_span() const {
189+
return {reinterpret_cast<const uint8_t*>(secret_.data()), secret_.size()};
190+
}
191+
192+
std::string_view SecureString::as_view() const {
193+
return {secret_.data(), secret_.size()};
194+
}
195+
196+
void SecureString::Dispose() { SecureClear(&secret_); }
197+
198+
} // namespace arrow::util

cpp/src/arrow/util/secure_string.h

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
// Licensed to the Apache Software Foundation (ASF) under one
2+
// or more contributor license agreements. See the NOTICE file
3+
// distributed with this work for additional information
4+
// regarding copyright ownership. The ASF licenses this file
5+
// to you under the Apache License, Version 2.0 (the
6+
// "License"); you may not use this file except in compliance
7+
// with the License. You may obtain a copy of the License at
8+
//
9+
// http://www.apache.org/licenses/LICENSE-2.0
10+
//
11+
// Unless required by applicable law or agreed to in writing,
12+
// software distributed under the License is distributed on an
13+
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
// KIND, either express or implied. See the License for the
15+
// specific language governing permissions and limitations
16+
// under the License.
17+
18+
#pragma once
19+
20+
#include <cstdint>
21+
#include <string>
22+
23+
#include "arrow/util/span.h"
24+
#include "arrow/util/visibility.h"
25+
26+
namespace arrow::util {
27+
/**
28+
* A secure string that ensures the wrapped string is cleared from memory on
29+
* deconstruction. This class can only be created from std::string that are securely
30+
* erased after creation.
31+
*
32+
* Note: This class does not provide a constructor / assignment operator that copies a
33+
* std::string because that would allow code to create a SecureString while accidentally
34+
* not noticing the need to securely erasing the argument after invoking the constructor /
35+
* calling the assignment operator.
36+
*/
37+
class ARROW_EXPORT SecureString {
38+
public:
39+
SecureString() noexcept = default;
40+
SecureString(SecureString&&) noexcept;
41+
SecureString(const SecureString&) = default;
42+
explicit SecureString(std::string&&) noexcept;
43+
explicit SecureString(size_t, char) noexcept;
44+
45+
SecureString& operator=(SecureString&&) noexcept;
46+
SecureString& operator=(const SecureString&);
47+
SecureString& operator=(std::string&&) noexcept;
48+
49+
bool operator==(const SecureString&) const;
50+
bool operator!=(const SecureString&) const;
51+
52+
~SecureString() { Dispose(); }
53+
54+
[[nodiscard]] bool empty() const;
55+
[[nodiscard]] std::size_t size() const;
56+
[[nodiscard]] std::size_t length() const;
57+
[[nodiscard]] std::size_t capacity() const;
58+
59+
[[nodiscard]] span<uint8_t> as_span();
60+
[[nodiscard]] span<const uint8_t> as_span() const;
61+
[[nodiscard]] std::string_view as_view() const;
62+
63+
void Dispose();
64+
65+
static void SecureClear(std::string*);
66+
static void SecureClear(uint8_t* data, size_t size);
67+
68+
private:
69+
std::string secret_;
70+
};
71+
72+
} // namespace arrow::util

0 commit comments

Comments
 (0)