mirror of
https://github.com/nestriness/cdc-file-transfer.git
synced 2026-01-30 12:25:35 +02:00
Improve readme (#19)
Improve readme This CL adds - a history section with references to Stadia - benchmarks - animated gifs with demos - a troubleshooting section - and more info about cdc_stream
This commit is contained in:
197
README.md
197
README.md
@@ -1,52 +1,109 @@
|
||||
# CDC File Transfer
|
||||
|
||||
This repository contains tools for synching and streaming files. They are based
|
||||
on Content Defined Chunking (CDC), in particular
|
||||
Born from the ashes of Stadia, this repository contains tools for synching and
|
||||
streaming files from Windows to Linux. They are based on Content Defined
|
||||
Chunking (CDC), in particular
|
||||
[FastCDC](https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia),
|
||||
to split up files into chunks.
|
||||
|
||||
## History
|
||||
|
||||
At Stadia, game developers had access to Linux cloud instances to run games.
|
||||
Most developers wrote their games on Windows, though. Therefore, they needed a
|
||||
way to make them available on the remote Linux instance.
|
||||
|
||||
As developers had SSH access to those instances, they could use `scp` to copy
|
||||
the game content. However, this was impractical, especially with the shift to
|
||||
working from home during the pandemic with sub-par internet connections. `scp`
|
||||
always copies full files, there is no "delta mode" to copy only the things that
|
||||
changed, it is slow for many small files, and there is no fast compression.
|
||||
|
||||
To help this situation, we developed two tools, `cdc_rsync` and `cdc_stream`,
|
||||
which enable developers to quickly iterate on their games without repeatedly
|
||||
incurring the cost of transmitting dozens of GBs.
|
||||
|
||||
## CDC RSync
|
||||
|
||||
CDC RSync is a tool to sync files from a Windows machine to a Linux device,
|
||||
`cdc_rsync` is a tool to sync files from a Windows machine to a Linux device,
|
||||
similar to the standard Linux [rsync](https://linux.die.net/man/1/rsync). It is
|
||||
basically a copy tool, but optimized for the case where there is already an old
|
||||
version of the files available in the target directory.
|
||||
* It skips files quickly if timestamp and file size match.
|
||||
* It quickly skips files if timestamp and file size match.
|
||||
* It uses fast compression for all data transfer.
|
||||
* If a file changed, it determines which parts changed and only transfers the
|
||||
differences.
|
||||
|
||||
<p align="center">
|
||||
<img src="docs/cdc_rsync_recursive_upload_demo.gif" alt="cdc_rsync demo" width="688" />
|
||||
</p>
|
||||
|
||||
The remote diffing algorithm is based on CDC. In our tests, it is up to 30x
|
||||
faster than the one used in rsync (1500 MB/s vs 50 MB/s).
|
||||
|
||||
The following chart shows a comparison of `cdc_rsync` and Linux rsync running
|
||||
under Cygwin on Windows. The test data consists of 58 development builds
|
||||
of some game provided to us for evaluation purposes. The builds are 40-45 GB
|
||||
large. For this experiment, we uploaded the first build, then synced the second
|
||||
build with each of the two tools and measured the time. For example, syncing
|
||||
from build 1 to build 2 took 210 seconds with the Linux rsync, but only 75
|
||||
seconds with `cdc_rsync`. The three outliers are probably feature drops from
|
||||
another development branch, where the delta was much higher. Overall,
|
||||
`cdc_rsync` syncs files about **3 times faster** than Linux rsync.
|
||||
|
||||
<p align="center">
|
||||
<img src="docs/cdc_rsync_vs_cygwin_rsync.png" alt="Comparison of cdc_rsync and Linux rsync running in Cygwin" width="753" />
|
||||
</p>
|
||||
|
||||
## CDC Stream
|
||||
|
||||
CDC Stream is a tool to stream files and directories from a Windows machine to a
|
||||
`cdc_stream` is a tool to stream files and directories from a Windows machine to a
|
||||
Linux device. Conceptually, it is similar to [sshfs](https://github.com/libfuse/sshfs),
|
||||
but it is optimized for read speed.
|
||||
* It caches streamed data on the Linux device.
|
||||
* If a file is re-read on Linux after it changed on Windows, only the
|
||||
differences are streamed again. The rest is read from cache.
|
||||
differences are streamed again. The rest is read from the cache.
|
||||
* Stat operations are very fast since the directory metadata (filenames,
|
||||
permissions etc.) is provided in a streaming-friendly way.
|
||||
|
||||
To efficiently determine which parts of a file changed, the tool uses the same
|
||||
CDC-based diffing algorithm as CDC RSync. Changes to Windows files are almost
|
||||
CDC-based diffing algorithm as `cdc_rsync`. Changes to Windows files are almost
|
||||
immediately reflected on Linux, with a delay of roughly (0.5s + 0.7s x total
|
||||
size of changed files in GB).
|
||||
|
||||
<p align="center">
|
||||
<img src="docs/cdc_stream_demo.gif" alt="cdc_stream demo" width="688" />
|
||||
</p>
|
||||
|
||||
The tool does not support writing files back from Linux to Windows; the Linux
|
||||
directory is readonly.
|
||||
|
||||
The following chart compares times from starting a game to reaching the menu.
|
||||
In one case, the game is streamed via `sshfs`, in the other case we use
|
||||
`cdc_stream`. Overall, we see a **2x to 5x speedup**.
|
||||
|
||||
<p align="center">
|
||||
<img src="docs/cdc_stream_vs_sshfs.png" alt="Comparison of cdc_stream and sshfs" width="752" />
|
||||
</p>
|
||||
|
||||
# Getting Started
|
||||
|
||||
The project has to be built both on Windows and Linux.
|
||||
Download the precompiled binaries from the
|
||||
[latest release](https://github.com/google/cdc-file-transfer/releases).
|
||||
We currently provide Linux binaries compiled on
|
||||
[Github's latest Ubuntu](https://github.com/actions/runner-images) version.
|
||||
If the binaries work for you, you can skip the following two sections.
|
||||
|
||||
Alternatively, the project can be built from source. Some binaries have to be
|
||||
built on Windows, some on Linux.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
The following steps have to be executed on **both Windows and Linux**.
|
||||
To build the tools from source, the following steps have to be executed on
|
||||
**both Windows and Linux**.
|
||||
|
||||
* Download and install Bazel from https://bazel.build/install.
|
||||
* Download and install Bazel from [here](https://bazel.build/install). See
|
||||
[workflow logs](https://github.com/google/cdc-file-transfer/actions) for the
|
||||
currently used version.
|
||||
* Clone the repository.
|
||||
```
|
||||
git clone https://github.com/google/cdc-file-transfer
|
||||
@@ -64,15 +121,15 @@ The file transfer tools require `ssh.exe` and `scp.exe`.
|
||||
|
||||
The two tools can be built and used independently.
|
||||
|
||||
### CDC Sync
|
||||
### CDC RSync
|
||||
|
||||
* Build Linux components
|
||||
```
|
||||
bazel build --config linux --compilation_mode=opt //cdc_rsync_server
|
||||
bazel build --config linux --compilation_mode=opt --linkopt=-Wl,--strip-all --copt=-fdata-sections --copt=-ffunction-sections --linkopt=-Wl,--gc-sections //cdc_rsync_server
|
||||
```
|
||||
* Build Windows components
|
||||
```
|
||||
bazel build --config windows --compilation_mode=opt //cdc_rsync
|
||||
bazel build --config windows --compilation_mode=opt --copt=/GL //cdc_rsync
|
||||
```
|
||||
* Copy the Linux build output file `cdc_rsync_server` from
|
||||
`bazel-bin/cdc_rsync_server` on the Linux system to `bazel-bin\cdc_rsync`
|
||||
@@ -82,11 +139,11 @@ The two tools can be built and used independently.
|
||||
|
||||
* Build Linux components
|
||||
```
|
||||
bazel build --config linux --compilation_mode=opt //cdc_fuse_fs
|
||||
bazel build --config linux --compilation_mode=opt --linkopt=-Wl,--strip-all --copt=-fdata-sections --copt=-ffunction-sections --linkopt=-Wl,--gc-sections //cdc_fuse_fs
|
||||
```
|
||||
* Build Windows components
|
||||
```
|
||||
bazel build --config windows --compilation_mode=opt //asset_stream_manager
|
||||
bazel build --config windows --compilation_mode=opt --copt=/GL //asset_stream_manager
|
||||
```
|
||||
* Copy the Linux build output files `cdc_fuse_fs` and `libfuse.so` from
|
||||
`bazel-bin/cdc_fuse_fs` on the Linux system to `bazel-bin\asset_stream_manager`
|
||||
@@ -94,25 +151,101 @@ The two tools can be built and used independently.
|
||||
|
||||
## Usage
|
||||
|
||||
### CDC Sync
|
||||
To copy the contents of the Windows directory `C:\path\to\assets` to `~/assets`
|
||||
on the Linux device `linux.machine.com`, run
|
||||
```
|
||||
cdc_rsync --ssh-command=C:\path\to\ssh.exe --scp-command=C:\path\to\scp.exe C:\path\to\assets\* user@linux.machine.com:~/assets -vr
|
||||
```
|
||||
Depending on your setup, you may have to specify additional arguments for the
|
||||
ssh and scp commands, including proper quoting, e.g.
|
||||
```
|
||||
cdc_rsync --ssh-command="\"C:\path with space\to\ssh.exe\" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile=\"\"\"known_hosts_file\"\"\"" --scp-command="\"C:\path with space\to\scp.exe\" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile=\"\"\"known_hosts_file\"\"\"" C:\path\to\assets\* user@linux.machine.com:~/assets -vr
|
||||
```
|
||||
Lengthy ssh/scp commands that rarely change can also be put into environment
|
||||
variables `CDC_SSH_COMMAND` and `CDC_SCP_COMMAND`, e.g.
|
||||
```
|
||||
set CDC_SSH_COMMAND="C:\path with space\to\ssh.exe" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""known_hosts_file"""
|
||||
The tools require a setup where you can use SSH and SCP from the Windows machine
|
||||
to the Linux device without entering a password, e.g. by using key-based
|
||||
authentication.
|
||||
|
||||
set CDC_SCP_COMMAND="C:\path with space\to\scp.exe" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""known_hosts_file"""
|
||||
### Configuring SSH and SCP
|
||||
|
||||
cdc_rsync C:\path\to\assets\* user@linux.machine.com:~/assets -vr
|
||||
By default, the tools search `ssh.exe` and `scp.exe` from the path environment
|
||||
variable. If you can run the following commands in a Windows cmd without
|
||||
entering your password, you are all set:
|
||||
```
|
||||
ssh user@linux.device.com
|
||||
scp somefile.txt user@linux.device.com:
|
||||
```
|
||||
Here, `user` is the Linux user and `linux.device.com` is the Linux host to
|
||||
SSH into or copy the file to.
|
||||
|
||||
If `ssh.exe` or `scp.exe` cannot be found, or if additional arguments are
|
||||
required, it is recommended to set the environment variables `CDC_SSH_COMMAND`
|
||||
and `CDC_SCP_COMMAND`. The following example specifies a custom path to the SSH
|
||||
and SCP binaries, a custom SSH config file, a key file and a known hosts file:
|
||||
```
|
||||
set CDC_SSH_COMMAND="C:\path with space\to\ssh.exe" -F C:\path\to\ssh_config -i C:\path\to\id_rsa -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""C:\path\to\known_hosts"""
|
||||
set CDC_SCP_COMMAND="C:\path with space\to\scp.exe" -F C:\path\to\ssh_config -i C:\path\to\id_rsa -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""C:\path\to\known_hosts"""
|
||||
```
|
||||
|
||||
#### Google Specific
|
||||
|
||||
For Google internal usage, set the following environment variables to enable SSH
|
||||
authentication using a Google security key:
|
||||
```
|
||||
set CDC_SSH_COMMAND=C:\gnubby\bin\ssh.exe
|
||||
set CDC_SCP_COMMAND=C:\gnubby\bin\scp.exe
|
||||
```
|
||||
Note that you will have to touch the security key multiple times during the
|
||||
first run. Subsequent runs only require a single touch.
|
||||
|
||||
### CDC RSync
|
||||
|
||||
`cdc_rsync` is used similar to `scp` or the Linux `rsync` command. To sync a
|
||||
single Windows file `C:\path\to\file.txt` to the home directory `~` on the Linux
|
||||
device `linux.device.com`, run
|
||||
```
|
||||
cdc_rsync C:\path\to\file.txt user@linux.device.com:~
|
||||
```
|
||||
`cdc_rsync` understands the usual Windows wildcards `*` and `?`.
|
||||
```
|
||||
cdc_rsync C:\path\to\*.txt user@linux.device.com:~
|
||||
```
|
||||
To sync the contents of the Windows directory `C:\path\to\assets` recursively to
|
||||
`~/assets` on the Linux device, run
|
||||
```
|
||||
cdc_rsync C:\path\to\assets\* user@linux.device.com:~/assets -r
|
||||
```
|
||||
To get per file progress, add `-v`:
|
||||
```
|
||||
cdc_rsync C:\path\to\assets\* user@linux.device.com:~/assets -vr
|
||||
```
|
||||
|
||||
### CDC Stream
|
||||
|
||||
`cdc_stream` consists of a background service called `asset_stream_manager`,
|
||||
which has to be started in advance with
|
||||
```
|
||||
asset_stream_manager
|
||||
```
|
||||
The service logs to `%APPDATA%\cdc-file-transfer\logs` by default. Try
|
||||
`asset_stream_manager --helpfull` to get a list of available flags.
|
||||
|
||||
To stream the Windows directory `C:\path\to\assets` to `~/assets` on the Linux
|
||||
device, run
|
||||
```
|
||||
cdc_stream start C:\path\to\assets user@linux.device.com:~/assets
|
||||
```
|
||||
This makes all files and directories of `C:\path\to\assets` available on
|
||||
`~/assets` immediately, as if it were a local copy. However, data is streamed
|
||||
from Windows to Linux as files are accessed.
|
||||
|
||||
To stop the streaming session, enter
|
||||
```
|
||||
cdc_stream stop user@linux.device.com:~/assets
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
`cdc_rsync` always logs to the console. By default, the `asset_stream_manager`
|
||||
service logs to a timestamped file in `%APPDATA%\cdc-file-transfer\logs`. It can
|
||||
be switched to log to console by starting it with `--log_to_stdout`:
|
||||
```
|
||||
asset_stream_manager --log_to_stdout
|
||||
```
|
||||
|
||||
Both `cdc_rsync` and `asset_stream_manager` support command line flags to control log
|
||||
verbosity. Passing `-vvv` prints debug logs, `-vvvv` prints verbose logs. The
|
||||
debug logs contain all SSH and SCP commands that are attempted to run, which is
|
||||
very useful for troubleshooting.
|
||||
|
||||
`cdc_stream` is just a thin client for the asset streaming service. Nothing ever
|
||||
goes wrong with it <sup>[citation needed]</sup>.
|
||||
|
||||
BIN
docs/cdc_rsync_recursive_upload_demo.gif
Normal file
BIN
docs/cdc_rsync_recursive_upload_demo.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 8.3 MiB |
BIN
docs/cdc_rsync_vs_cygwin_rsync.png
Normal file
BIN
docs/cdc_rsync_vs_cygwin_rsync.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 60 KiB |
BIN
docs/cdc_stream_demo.gif
Normal file
BIN
docs/cdc_stream_demo.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 12 MiB |
BIN
docs/cdc_stream_vs_sshfs.png
Normal file
BIN
docs/cdc_stream_vs_sshfs.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 32 KiB |
Reference in New Issue
Block a user