[cdc_rsync] Enable local syncing (#75)

Adds support for local syncs of files and folders on the same Windows
machine, e.g. cdc_rsync C:\source C:\dest. The two main changes are

- Skip the check whether the port is available remotely with PortManager.
- Do not deploy cdc_rsync_server.
- Run cdc_rsync_server directly, not through an SSH tunnel.

The current implementation is not optimal as it starts
cdc_rsync_server as a separate process and communicates to it via a
TCP port.
This commit is contained in:
Lutz Justen
2023-01-26 09:57:19 +01:00
committed by GitHub
parent 9cf71cae65
commit f8c10ce7bd
12 changed files with 168 additions and 73 deletions

View File

@@ -1,8 +1,8 @@
# CDC File Transfer
Born from the ashes of Stadia, this repository contains tools for syncing and
streaming files from Windows to Linux. They are based on Content Defined
Chunking (CDC), in particular
streaming files from Windows to Windows or Linux. The tools are based on Content
Defined Chunking (CDC), in particular
[FastCDC](https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia),
to split up files into chunks.
@@ -132,9 +132,9 @@ difference operation. It does not involve a per-byte hash map lookup.
## CDC Stream
`cdc_stream` is a tool to stream files and directories from a Windows machine to a
Linux device. Conceptually, it is similar to [sshfs](https://github.com/libfuse/sshfs),
but it is optimized for read speed.
`cdc_stream` is a tool to stream files and directories from a Windows machine to
a Linux device. Conceptually, it is similar to
[sshfs](https://github.com/libfuse/sshfs), but it is optimized for read speed.
* It caches streamed data on the Linux device.
* If a file is re-read on Linux after it changed on Windows, only the
differences are streamed again. The rest is read from the cache.
@@ -161,6 +161,34 @@ In one case, the game is streamed via `sshfs`, in the other case we use
<img src="docs/cdc_stream_vs_sshfs.png" alt="Comparison of cdc_stream and sshfs" width="752" />
</p>
# Supported Platforms
| `cdc_rsync` | From | To |
|:-----------------------------|:--------------------:|:--------------------:|
| Windows x86_64 | &check; | &check; <sup>1</sup> |
| Ubuntu 22.04 x86_64 | &cross; <sup>2</sup> | &check; |
| Ubuntu 22.04 aarch64 | &cross; | &cross; |
| macOS 13 x86_64 <sup>3</sup> | &cross; | &cross; |
| macOS 13 aarch64 <sup>3</sup>| &cross; | &cross; |
| `cdc_stream` | From | To |
|:-----------------------------|:--------------------:|:--------------------:|
| Windows x86_64 | &check; | &cross; |
| Ubuntu 22.04 x86_64 | &cross; | &check; |
| Ubuntu 22.04 aarch64 | &cross; | &cross; |
| macOS 13 x86_64 <sup>3</sup> | &cross; | &cross; |
| macOS 13 aarch64 <sup>3</sup>| &cross; | &cross; |
<span style="font-size: 0.8rem">
<sup>1</sup> Only local syncs, e.g. `cdc_rsync C:\src\* C:\dst`. Support for
remote syncs is being added, see
[#61](https://github.com/google/cdc-file-transfer/issues/61).
<sup>2</sup> See [#56](https://github.com/google/cdc-file-transfer/issues/56).
<sup>3</sup> See [#62](https://github.com/google/cdc-file-transfer/issues/62).
</span>
# Getting Started
Download the precompiled binaries from the
@@ -190,7 +218,7 @@ To build the tools from source, the following steps have to be executed on
git submodule update --init --recursive
```
Finally, install an SSH client on the Windows device if not present.
Finally, install an SSH client on the Windows machine if not present.
The file transfer tools require `ssh.exe` and `sftp.exe`.
## Building
@@ -304,6 +332,10 @@ To get per file progress, add `-v`:
```
cdc_rsync C:\path\to\assets\* user@linux.device.com:~/assets -vr
```
The tool also supports local syncs:
```
cdc_rsync C:\path\to\assets\* C:\path\to\destination -vr
```
### CDC Stream

View File

@@ -30,7 +30,9 @@
#include "common/gamelet_component.h"
#include "common/log.h"
#include "common/path.h"
#include "common/port_manager.h"
#include "common/process.h"
#include "common/remote_util.h"
#include "common/status.h"
#include "common/status_macros.h"
#include "common/stopwatch.h"
@@ -45,8 +47,6 @@ constexpr int kExitCodeCouldNotExecute = 126;
// Bash exit code if binary was not found.
constexpr int kExitCodeNotFound = 127;
constexpr char kCdcRsyncFilename[] = "cdc_rsync.exe";
SetOptionsRequest::FilterRule::Type ToProtoType(PathFilter::Rule::Type type) {
switch (type) {
case PathFilter::Rule::Type::kInclude:
@@ -98,20 +98,27 @@ CdcRsyncClient::CdcRsyncClient(const Options& options,
: options_(options),
sources_(std::move(sources)),
destination_(std::move(destination)),
remote_util_(std::move(user_host), options.verbosity, options.quiet,
&process_factory_,
/*forward_output_to_log=*/false),
port_manager_("cdc_rsync_ports_f77bcdfe-368c-4c45-9f01-230c5e7e2132",
options.forward_port_first, options.forward_port_last,
&process_factory_, &remote_util_),
printer_(options.quiet, Util::IsTTY() && !options.json),
progress_(&printer_, options.verbosity, options.json) {
// If there is no |user_host|, we sync files locally!
if (!user_host.empty()) {
remote_util_ =
std::make_unique<RemoteUtil>(std::move(user_host), options.verbosity,
options.quiet, &process_factory_,
/*forward_output_to_log=*/false);
if (!options_.ssh_command.empty()) {
remote_util_.SetSshCommand(options_.ssh_command);
remote_util_->SetSshCommand(options_.ssh_command);
}
if (!options_.sftp_command.empty()) {
remote_util_.SetSftpCommand(options_.sftp_command);
remote_util_->SetSftpCommand(options_.sftp_command);
}
}
// Note that remote_util_.get() may be null.
port_manager_ = std::make_unique<PortManager>(
"cdc_rsync_ports_f77bcdfe-368c-4c45-9f01-230c5e7e2132",
options.forward_port_first, options.forward_port_last, &process_factory_,
remote_util_.get());
}
CdcRsyncClient::~CdcRsyncClient() {
@@ -123,7 +130,10 @@ absl::Status CdcRsyncClient::Run() {
int port;
ASSIGN_OR_RETURN(port, FindAvailablePort(), "Failed to find available port");
ServerArch server_arch(ServerArch::Detect(destination_));
// If |remote_util_| is not set, it's a local sync.
ServerArch::Type arch_type =
remote_util_ ? ServerArch::Detect(destination_) : ServerArch::LocalType();
ServerArch server_arch(arch_type);
// Start the server process.
absl::Status status = StartServer(port, server_arch);
@@ -174,7 +184,7 @@ absl::StatusOr<int> CdcRsyncClient::FindAvailablePort() {
}
absl::StatusOr<int> port =
port_manager_.ReservePort(options_.connection_timeout_sec);
port_manager_->ReservePort(options_.connection_timeout_sec);
if (absl::IsDeadlineExceeded(port.status())) {
// Server didn't respond in time.
return SetTag(port.status(), Tag::kConnectionTimeout);
@@ -203,15 +213,28 @@ absl::Status CdcRsyncClient::StartServer(int port, const ServerArch& arch) {
return MakeStatus(
"Required instance component not found. Make sure the file "
"%s resides in the same folder as %s.",
arch.CdcServerFilename(), kCdcRsyncFilename);
arch.CdcServerFilename(), ServerArch::CdcRsyncFilename());
}
std::string component_args = GameletComponent::ToCommandLineArgs(components);
ProcessStartInfo start_info;
start_info.name = "cdc_rsync_server";
if (remote_util_) {
// Run cdc_rsync_server on the remote instance.
std::string remote_command = arch.GetStartServerCommand(
kExitCodeNotFound, absl::StrFormat("%i %s", port, component_args));
ProcessStartInfo start_info =
remote_util_.BuildProcessStartInfoForSshPortForwardAndCommand(
start_info = remote_util_->BuildProcessStartInfoForSshPortForwardAndCommand(
port, port, /*reverse=*/false, remote_command);
start_info.name = "cdc_rsync_server";
} else {
// Run cdc_rsync_server locally.
std::string exe_dir;
RETURN_IF_ERROR(path::GetExeDir(&exe_dir), "Failed to get exe directory");
std::string server_path = path::Join(exe_dir, arch.CdcServerFilename());
start_info.command =
absl::StrFormat("%s %i %s", server_path, port, component_args);
}
// Capture stdout, but forward to stdout for debugging purposes.
start_info.stdout_handler = [this](const char* data, size_t /*data_size*/) {
@@ -254,6 +277,12 @@ absl::Status CdcRsyncClient::StartServer(int port, const ServerArch& arch) {
return GetServerExitStatus(server_exit_code_, server_error_);
}
// Don't re-deploy if we're not copying to a remote device. We can start
// cdc_rsync_server from the original location directly.
if (!remote_util_) {
return GetServerExitStatus(server_exit_code_, server_error_);
}
// Server exited before it started listening, most likely because of
// outdated components (code kServerExitCodeOutOfDate) or because the server
// wasn't deployed at all yet (code kExitCodeNotFound). Instruct caller
@@ -394,6 +423,7 @@ absl::Status CdcRsyncClient::Sync() {
absl::Status CdcRsyncClient::DeployServer(const ServerArch& arch) {
assert(!server_process_);
assert(remote_util_);
std::string exe_dir;
absl::Status status = path::GetExeDir(&exe_dir);
@@ -415,7 +445,7 @@ absl::Status CdcRsyncClient::DeployServer(const ServerArch& arch) {
// sftp cdc_rsync_server to the target.
std::string commands = arch.GetDeploySftpCommands();
RETURN_IF_ERROR(remote_util_.Sftp(commands, exe_dir, /*compress=*/false),
RETURN_IF_ERROR(remote_util_->Sftp(commands, exe_dir, /*compress=*/false),
"Failed to deploy cdc_rsync_server");
return absl::OkStatus();

View File

@@ -21,16 +21,18 @@
#include <vector>
#include "absl/status/status.h"
#include "absl/status/statusor.h"
#include "cdc_rsync/base/message_pump.h"
#include "cdc_rsync/client_socket.h"
#include "cdc_rsync/progress_tracker.h"
#include "common/path_filter.h"
#include "common/port_manager.h"
#include "common/remote_util.h"
#include "common/process.h"
namespace cdc_ft {
class PortManager;
class Process;
class RemoteUtil;
class ServerArch;
class ZstdStream;
@@ -129,8 +131,8 @@ class CdcRsyncClient {
std::vector<std::string> sources_;
const std::string destination_;
WinProcessFactory process_factory_;
RemoteUtil remote_util_;
PortManager port_manager_;
std::unique_ptr<RemoteUtil> remote_util_;
std::unique_ptr<PortManager> port_manager_;
std::unique_ptr<SocketFinalizer> socket_finalizer_;
ClientSocket socket_;
MessagePump message_pump_{&socket_, MessagePump::PacketReceivedDelegate()};

View File

@@ -38,22 +38,23 @@ void PrintError(const absl::FormatSpec<Args...>& format, Args... args) {
enum class OptionResult { kConsumedKey, kConsumedKeyValue, kError };
const char kHelpText[] =
R"(Copy local files to a gamelet
R"(Synchronize files and directories
Synchronizes local files and files on a gamelet. Matching files are skipped.
For partially matching files only the deltas are transferred.
Matching files are skipped based on file size and modified time. For partially
matching files only the differences are transferred. The destination directory
can be the same Windows machine or a remote Windows or Linux device.
Usage:
cdc_rsync [options] source [source]... [user@]host:destination
cdc_rsync [options] source [source]... [[user@]host:]destination
Parameters:
source Local file or directory to be copied
source Local file or directory to be copied or synced
user Remote SSH user name
host Remote host or IP address
destination Remote destination directory
destination Local or remote destination directory
Options:
--contimeout sec Gamelet connection timeout in seconds (default: 10)
--contimeout sec Remote connection timeout in seconds (default: 10)
-q, --quiet Quiet mode, only print errors
-v, --verbose Increase output verbosity
--json Print JSON progress
@@ -81,7 +82,7 @@ Options:
Can also be specified by the CDC_SFTP_COMMAND environment variable.
--forward-port <port> TCP port or range used for SSH port forwarding (default: 44450-44459).
If a range is specified, searches for available ports (slower).
-h --help Help for cdc_rsync
-h, --help Help for cdc_rsync
)";
constexpr char kSshCommandEnvVar[] = "CDC_SSH_COMMAND";
@@ -375,14 +376,6 @@ bool ValidateParameters(const Parameters& params, bool help) {
return false;
}
if (params.user_host.empty()) {
PrintError(
"No remote host specified in destination '%s'. "
"Expected [user@]host:destination.",
params.destination);
return false;
}
return true;
}
@@ -408,16 +401,15 @@ bool CheckOptionResult(OptionResult result, const std::string& name,
// afterward and |user_host| is |user@foo.com|. Does not touch Windows drives,
// e.g. C:\foo.
void PopUserHost(std::string* destination, std::string* user_host) {
user_host->clear();
// Don't mistake the C part of C:\foo or \\share\C:\foo as user/host.
if (!path::GetDrivePrefix(*destination).empty()) return;
std::vector<std::string> parts =
absl::StrSplit(*destination, absl::MaxSplits(':', 1));
if (parts.size() < 2) return;
// Don't mistake the C part of C:\foo as user/host.
if (parts[0].size() == 1 && toupper(parts[0][0]) >= 'A' &&
toupper(parts[0][0]) <= 'Z') {
return;
}
*user_host = parts[0];
*destination = parts[1];
}

View File

@@ -228,18 +228,25 @@ TEST_F(ParamsTest, ParseSucceedsWithNoSftpCommand) {
ExpectError(NeedsValueError("sftp-command"));
}
TEST_F(ParamsTest, ParseFailsOnNoUserHost) {
TEST_F(ParamsTest, ParseSucceedsOnNoUserHost) {
const char* argv[] = {"cdc_rsync.exe", kSrc, kDst, NULL};
EXPECT_FALSE(
Parse(static_cast<int>(std::size(argv)) - 1, argv, &parameters_));
ExpectError("No remote host specified");
EXPECT_TRUE(Parse(static_cast<int>(std::size(argv)) - 1, argv, &parameters_));
}
TEST_F(ParamsTest, ParseDoesNotThinkCIsAHost) {
TEST_F(ParamsTest, ParseDoesNotThinkDriveIsAHost) {
const char* argv[] = {"cdc_rsync.exe", kSrc, "C:\\foo", NULL};
EXPECT_FALSE(
Parse(static_cast<int>(std::size(argv)) - 1, argv, &parameters_));
ExpectError("No remote host specified");
EXPECT_TRUE(Parse(static_cast<int>(std::size(argv)) - 1, argv, &parameters_));
EXPECT_TRUE(parameters_.user_host.empty());
const char* argv2[] = {"cdc_rsync.exe", kSrc, "\\\\.\\C:\\foo", NULL};
EXPECT_TRUE(
Parse(static_cast<int>(std::size(argv2)) - 1, argv, &parameters_));
EXPECT_TRUE(parameters_.user_host.empty());
const char* argv3[] = {"cdc_rsync.exe", kSrc, "\\\\?\\C:\\foo", NULL};
EXPECT_TRUE(
Parse(static_cast<int>(std::size(argv3)) - 1, argv, &parameters_));
EXPECT_TRUE(parameters_.user_host.empty());
}
TEST_F(ParamsTest, ParseWithoutParametersFailsOnMissingSourceAndDestination) {

View File

@@ -20,6 +20,7 @@
#include "absl/strings/str_format.h"
#include "absl/strings/str_split.h"
#include "common/path.h"
#include "common/platform.h"
#include "common/remote_util.h"
#include "common/util.h"
@@ -58,6 +59,28 @@ ServerArch::Type ServerArch::Detect(const std::string& destination) {
return Type::kLinux;
}
// static
ServerArch::Type ServerArch::LocalType() {
#if PLATFORM_WINDOWS
return ServerArch::Type::kWindows;
#elif PLATFORM_LINUX
return ServerArch::Type::kLinux;
#endif
}
// static
std::string ServerArch::CdcRsyncFilename() {
switch (LocalType()) {
case Type::kWindows:
return "cdc_rsync.exe";
case Type::kLinux:
return "cdc_rsync";
default:
assert(!kErrorArchTypeUnhandled);
return std::string();
}
}
ServerArch::ServerArch(Type type) : type_(type) {}
ServerArch::~ServerArch() {}

View File

@@ -29,10 +29,16 @@ class ServerArch {
kWindows = 1,
};
// Detects the architecture type based on the destination path, e.g. path
// Detects the arch type based on the destination path, e.g. path
// starting with C: indicate Windows.
static Type Detect(const std::string& destination);
// Returns the arch type that matches the current process's type.
static Type LocalType();
// Returns the (local!) arch specific filename of cdc_rsync[.exe].
static std::string CdcRsyncFilename();
ServerArch(Type type);
~ServerArch();

View File

@@ -291,8 +291,8 @@ std::string GetDrivePrefix(const std::string& path) {
if (path[0] != '\\') {
size_t pos = path.find(":");
if (pos == std::string::npos) {
// E.g. "\path\to\file" or "path\to\file".
if (pos != 1) {
// E.g. "\path\to\file", "path\to\file" or "user@host:file".
return std::string();
}

View File

@@ -302,6 +302,7 @@ TEST_F(PathTest, GetDrivePrefix) {
EXPECT_EQ(path::GetDrivePrefix("C:\\"), "C:");
EXPECT_EQ(path::GetDrivePrefix("C:\\dir"), "C:");
EXPECT_EQ(path::GetDrivePrefix("C:\\dir\\file"), "C:");
EXPECT_EQ(path::GetDrivePrefix("host:C:\\dir\\file"), "");
}
#endif

View File

@@ -40,8 +40,8 @@ class PortManager {
// synchronize port reservation. The range of possible ports managed by this
// instance is [|first_port|, |last_port|]. |process_factory| is a valid
// pointer to a ProcessFactory instance to run processes locally.
// |remote_util| is a valid pointer to a RemoteUtil instance to run processes
// remotely.
// |remote_util| is the RemoteUtil instance to run processes remotely. If it
// is nullptr, no remote ports are reserved.
PortManager(std::string unique_name, int first_port, int last_port,
ProcessFactory* process_factory, RemoteUtil* remote_util,
SystemClock* system_clock = DefaultSystemClock::GetInstance(),

View File

@@ -131,11 +131,13 @@ absl::StatusOr<int> PortManager::ReservePort(int remote_timeout_sec) {
// Find available port on remote instance.
std::unordered_set<int> remote_ports = local_ports;
if (remote_util_ != nullptr) {
ASSIGN_OR_RETURN(remote_ports,
FindAvailableRemotePorts(first_port_, last_port_, "0.0.0.0",
process_factory_, remote_util_,
remote_timeout_sec, steady_clock_),
FindAvailableRemotePorts(
first_port_, last_port_, "0.0.0.0", process_factory_,
remote_util_, remote_timeout_sec, steady_clock_),
"Failed to find available ports on instance");
}
// Fetch shared memory.
void* mem;

View File

@@ -240,7 +240,7 @@ class ChunkerTmpl {
}
// Init hash to all 1's to avoid zero-length chunks with min_size=0.
uint64_t hash = (uint64_t)-1;
uint64_t hash = UINT64_MAX;
// Skip the first min_size bytes, but "warm up" the rolling hash for 64
// rounds to make sure the 64-bit hash has gathered full "content history".
size_t i = cfg_.min_size > 64 ? cfg_.min_size - 64 : 0;