diff --git a/README.md b/README.md index 292901a..3f07af4 100644 --- a/README.md +++ b/README.md @@ -1,52 +1,109 @@ # CDC File Transfer -This repository contains tools for synching and streaming files. They are based -on Content Defined Chunking (CDC), in particular +Born from the ashes of Stadia, this repository contains tools for synching and +streaming files from Windows to Linux. They are based on Content Defined +Chunking (CDC), in particular [FastCDC](https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia), to split up files into chunks. +## History + +At Stadia, game developers had access to Linux cloud instances to run games. +Most developers wrote their games on Windows, though. Therefore, they needed a +way to make them available on the remote Linux instance. + +As developers had SSH access to those instances, they could use `scp` to copy +the game content. However, this was impractical, especially with the shift to +working from home during the pandemic with sub-par internet connections. `scp` +always copies full files, there is no "delta mode" to copy only the things that +changed, it is slow for many small files, and there is no fast compression. + +To help this situation, we developed two tools, `cdc_rsync` and `cdc_stream`, +which enable developers to quickly iterate on their games without repeatedly +incurring the cost of transmitting dozens of GBs. + ## CDC RSync -CDC RSync is a tool to sync files from a Windows machine to a Linux device, +`cdc_rsync` is a tool to sync files from a Windows machine to a Linux device, similar to the standard Linux [rsync](https://linux.die.net/man/1/rsync). It is basically a copy tool, but optimized for the case where there is already an old version of the files available in the target directory. -* It skips files quickly if timestamp and file size match. +* It quickly skips files if timestamp and file size match. * It uses fast compression for all data transfer. * If a file changed, it determines which parts changed and only transfers the differences. +

+ cdc_rsync demo +

+ The remote diffing algorithm is based on CDC. In our tests, it is up to 30x faster than the one used in rsync (1500 MB/s vs 50 MB/s). +The following chart shows a comparison of `cdc_rsync` and Linux rsync running +under Cygwin on Windows. The test data consists of 58 development builds +of some game provided to us for evaluation purposes. The builds are 40-45 GB +large. For this experiment, we uploaded the first build, then synced the second +build with each of the two tools and measured the time. For example, syncing +from build 1 to build 2 took 210 seconds with the Linux rsync, but only 75 +seconds with `cdc_rsync`. The three outliers are probably feature drops from +another development branch, where the delta was much higher. Overall, +`cdc_rsync` syncs files about **3 times faster** than Linux rsync. + +

+ Comparison of cdc_rsync and Linux rsync running in Cygwin +

+ ## CDC Stream -CDC Stream is a tool to stream files and directories from a Windows machine to a +`cdc_stream` is a tool to stream files and directories from a Windows machine to a Linux device. Conceptually, it is similar to [sshfs](https://github.com/libfuse/sshfs), but it is optimized for read speed. * It caches streamed data on the Linux device. * If a file is re-read on Linux after it changed on Windows, only the - differences are streamed again. The rest is read from cache. + differences are streamed again. The rest is read from the cache. * Stat operations are very fast since the directory metadata (filenames, permissions etc.) is provided in a streaming-friendly way. To efficiently determine which parts of a file changed, the tool uses the same -CDC-based diffing algorithm as CDC RSync. Changes to Windows files are almost +CDC-based diffing algorithm as `cdc_rsync`. Changes to Windows files are almost immediately reflected on Linux, with a delay of roughly (0.5s + 0.7s x total size of changed files in GB). +

+ cdc_stream demo +

+ The tool does not support writing files back from Linux to Windows; the Linux directory is readonly. +The following chart compares times from starting a game to reaching the menu. +In one case, the game is streamed via `sshfs`, in the other case we use +`cdc_stream`. Overall, we see a **2x to 5x speedup**. + +

+ Comparison of cdc_stream and sshfs +

+ # Getting Started -The project has to be built both on Windows and Linux. +Download the precompiled binaries from the +[latest release](https://github.com/google/cdc-file-transfer/releases). +We currently provide Linux binaries compiled on +[Github's latest Ubuntu](https://github.com/actions/runner-images) version. +If the binaries work for you, you can skip the following two sections. + +Alternatively, the project can be built from source. Some binaries have to be +built on Windows, some on Linux. ## Prerequisites -The following steps have to be executed on **both Windows and Linux**. +To build the tools from source, the following steps have to be executed on +**both Windows and Linux**. -* Download and install Bazel from https://bazel.build/install. +* Download and install Bazel from [here](https://bazel.build/install). See + [workflow logs](https://github.com/google/cdc-file-transfer/actions) for the + currently used version. * Clone the repository. ``` git clone https://github.com/google/cdc-file-transfer @@ -64,15 +121,15 @@ The file transfer tools require `ssh.exe` and `scp.exe`. The two tools can be built and used independently. -### CDC Sync +### CDC RSync * Build Linux components ``` - bazel build --config linux --compilation_mode=opt //cdc_rsync_server + bazel build --config linux --compilation_mode=opt --linkopt=-Wl,--strip-all --copt=-fdata-sections --copt=-ffunction-sections --linkopt=-Wl,--gc-sections //cdc_rsync_server ``` * Build Windows components ``` - bazel build --config windows --compilation_mode=opt //cdc_rsync + bazel build --config windows --compilation_mode=opt --copt=/GL //cdc_rsync ``` * Copy the Linux build output file `cdc_rsync_server` from `bazel-bin/cdc_rsync_server` on the Linux system to `bazel-bin\cdc_rsync` @@ -82,11 +139,11 @@ The two tools can be built and used independently. * Build Linux components ``` - bazel build --config linux --compilation_mode=opt //cdc_fuse_fs + bazel build --config linux --compilation_mode=opt --linkopt=-Wl,--strip-all --copt=-fdata-sections --copt=-ffunction-sections --linkopt=-Wl,--gc-sections //cdc_fuse_fs ``` * Build Windows components ``` - bazel build --config windows --compilation_mode=opt //asset_stream_manager + bazel build --config windows --compilation_mode=opt --copt=/GL //asset_stream_manager ``` * Copy the Linux build output files `cdc_fuse_fs` and `libfuse.so` from `bazel-bin/cdc_fuse_fs` on the Linux system to `bazel-bin\asset_stream_manager` @@ -94,25 +151,101 @@ The two tools can be built and used independently. ## Usage -### CDC Sync -To copy the contents of the Windows directory `C:\path\to\assets` to `~/assets` -on the Linux device `linux.machine.com`, run -``` -cdc_rsync --ssh-command=C:\path\to\ssh.exe --scp-command=C:\path\to\scp.exe C:\path\to\assets\* user@linux.machine.com:~/assets -vr -``` -Depending on your setup, you may have to specify additional arguments for the -ssh and scp commands, including proper quoting, e.g. -``` -cdc_rsync --ssh-command="\"C:\path with space\to\ssh.exe\" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile=\"\"\"known_hosts_file\"\"\"" --scp-command="\"C:\path with space\to\scp.exe\" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile=\"\"\"known_hosts_file\"\"\"" C:\path\to\assets\* user@linux.machine.com:~/assets -vr -``` -Lengthy ssh/scp commands that rarely change can also be put into environment -variables `CDC_SSH_COMMAND` and `CDC_SCP_COMMAND`, e.g. -``` -set CDC_SSH_COMMAND="C:\path with space\to\ssh.exe" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""known_hosts_file""" +The tools require a setup where you can use SSH and SCP from the Windows machine +to the Linux device without entering a password, e.g. by using key-based +authentication. -set CDC_SCP_COMMAND="C:\path with space\to\scp.exe" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""known_hosts_file""" +### Configuring SSH and SCP -cdc_rsync C:\path\to\assets\* user@linux.machine.com:~/assets -vr +By default, the tools search `ssh.exe` and `scp.exe` from the path environment +variable. If you can run the following commands in a Windows cmd without +entering your password, you are all set: +``` +ssh user@linux.device.com +scp somefile.txt user@linux.device.com: +``` +Here, `user` is the Linux user and `linux.device.com` is the Linux host to +SSH into or copy the file to. + +If `ssh.exe` or `scp.exe` cannot be found, or if additional arguments are +required, it is recommended to set the environment variables `CDC_SSH_COMMAND` +and `CDC_SCP_COMMAND`. The following example specifies a custom path to the SSH +and SCP binaries, a custom SSH config file, a key file and a known hosts file: +``` +set CDC_SSH_COMMAND="C:\path with space\to\ssh.exe" -F C:\path\to\ssh_config -i C:\path\to\id_rsa -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""C:\path\to\known_hosts""" +set CDC_SCP_COMMAND="C:\path with space\to\scp.exe" -F C:\path\to\ssh_config -i C:\path\to\id_rsa -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""C:\path\to\known_hosts""" +``` + +#### Google Specific + +For Google internal usage, set the following environment variables to enable SSH +authentication using a Google security key: +``` +set CDC_SSH_COMMAND=C:\gnubby\bin\ssh.exe +set CDC_SCP_COMMAND=C:\gnubby\bin\scp.exe +``` +Note that you will have to touch the security key multiple times during the +first run. Subsequent runs only require a single touch. + +### CDC RSync + +`cdc_rsync` is used similar to `scp` or the Linux `rsync` command. To sync a +single Windows file `C:\path\to\file.txt` to the home directory `~` on the Linux +device `linux.device.com`, run +``` +cdc_rsync C:\path\to\file.txt user@linux.device.com:~ +``` +`cdc_rsync` understands the usual Windows wildcards `*` and `?`. +``` +cdc_rsync C:\path\to\*.txt user@linux.device.com:~ +``` +To sync the contents of the Windows directory `C:\path\to\assets` recursively to +`~/assets` on the Linux device, run +``` +cdc_rsync C:\path\to\assets\* user@linux.device.com:~/assets -r +``` +To get per file progress, add `-v`: +``` +cdc_rsync C:\path\to\assets\* user@linux.device.com:~/assets -vr ``` ### CDC Stream + +`cdc_stream` consists of a background service called `asset_stream_manager`, +which has to be started in advance with +``` +asset_stream_manager +``` +The service logs to `%APPDATA%\cdc-file-transfer\logs` by default. Try +`asset_stream_manager --helpfull` to get a list of available flags. + +To stream the Windows directory `C:\path\to\assets` to `~/assets` on the Linux +device, run +``` +cdc_stream start C:\path\to\assets user@linux.device.com:~/assets +``` +This makes all files and directories of `C:\path\to\assets` available on +`~/assets` immediately, as if it were a local copy. However, data is streamed +from Windows to Linux as files are accessed. + +To stop the streaming session, enter +``` +cdc_stream stop user@linux.device.com:~/assets +``` + +## Troubleshooting + +`cdc_rsync` always logs to the console. By default, the `asset_stream_manager` +service logs to a timestamped file in `%APPDATA%\cdc-file-transfer\logs`. It can +be switched to log to console by starting it with `--log_to_stdout`: +``` +asset_stream_manager --log_to_stdout +``` + +Both `cdc_rsync` and `asset_stream_manager` support command line flags to control log +verbosity. Passing `-vvv` prints debug logs, `-vvvv` prints verbose logs. The +debug logs contain all SSH and SCP commands that are attempted to run, which is +very useful for troubleshooting. + +`cdc_stream` is just a thin client for the asset streaming service. Nothing ever +goes wrong with it [citation needed]. diff --git a/docs/cdc_rsync_recursive_upload_demo.gif b/docs/cdc_rsync_recursive_upload_demo.gif new file mode 100644 index 0000000..c7fa10e Binary files /dev/null and b/docs/cdc_rsync_recursive_upload_demo.gif differ diff --git a/docs/cdc_rsync_vs_cygwin_rsync.png b/docs/cdc_rsync_vs_cygwin_rsync.png new file mode 100644 index 0000000..6d2eab3 Binary files /dev/null and b/docs/cdc_rsync_vs_cygwin_rsync.png differ diff --git a/docs/cdc_stream_demo.gif b/docs/cdc_stream_demo.gif new file mode 100644 index 0000000..9cb2b3a Binary files /dev/null and b/docs/cdc_stream_demo.gif differ diff --git a/docs/cdc_stream_vs_sshfs.png b/docs/cdc_stream_vs_sshfs.png new file mode 100644 index 0000000..39b7840 Binary files /dev/null and b/docs/cdc_stream_vs_sshfs.png differ