mirror of
https://github.com/nestriness/cdc-file-transfer.git
synced 2026-01-30 14:35:37 +02:00
Improve readme (#19)
Improve readme This CL adds - a history section with references to Stadia - benchmarks - animated gifs with demos - a troubleshooting section - and more info about cdc_stream
This commit is contained in:
197
README.md
197
README.md
@@ -1,52 +1,109 @@
|
|||||||
# CDC File Transfer
|
# CDC File Transfer
|
||||||
|
|
||||||
This repository contains tools for synching and streaming files. They are based
|
Born from the ashes of Stadia, this repository contains tools for synching and
|
||||||
on Content Defined Chunking (CDC), in particular
|
streaming files from Windows to Linux. They are based on Content Defined
|
||||||
|
Chunking (CDC), in particular
|
||||||
[FastCDC](https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia),
|
[FastCDC](https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia),
|
||||||
to split up files into chunks.
|
to split up files into chunks.
|
||||||
|
|
||||||
|
## History
|
||||||
|
|
||||||
|
At Stadia, game developers had access to Linux cloud instances to run games.
|
||||||
|
Most developers wrote their games on Windows, though. Therefore, they needed a
|
||||||
|
way to make them available on the remote Linux instance.
|
||||||
|
|
||||||
|
As developers had SSH access to those instances, they could use `scp` to copy
|
||||||
|
the game content. However, this was impractical, especially with the shift to
|
||||||
|
working from home during the pandemic with sub-par internet connections. `scp`
|
||||||
|
always copies full files, there is no "delta mode" to copy only the things that
|
||||||
|
changed, it is slow for many small files, and there is no fast compression.
|
||||||
|
|
||||||
|
To help this situation, we developed two tools, `cdc_rsync` and `cdc_stream`,
|
||||||
|
which enable developers to quickly iterate on their games without repeatedly
|
||||||
|
incurring the cost of transmitting dozens of GBs.
|
||||||
|
|
||||||
## CDC RSync
|
## CDC RSync
|
||||||
|
|
||||||
CDC RSync is a tool to sync files from a Windows machine to a Linux device,
|
`cdc_rsync` is a tool to sync files from a Windows machine to a Linux device,
|
||||||
similar to the standard Linux [rsync](https://linux.die.net/man/1/rsync). It is
|
similar to the standard Linux [rsync](https://linux.die.net/man/1/rsync). It is
|
||||||
basically a copy tool, but optimized for the case where there is already an old
|
basically a copy tool, but optimized for the case where there is already an old
|
||||||
version of the files available in the target directory.
|
version of the files available in the target directory.
|
||||||
* It skips files quickly if timestamp and file size match.
|
* It quickly skips files if timestamp and file size match.
|
||||||
* It uses fast compression for all data transfer.
|
* It uses fast compression for all data transfer.
|
||||||
* If a file changed, it determines which parts changed and only transfers the
|
* If a file changed, it determines which parts changed and only transfers the
|
||||||
differences.
|
differences.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="docs/cdc_rsync_recursive_upload_demo.gif" alt="cdc_rsync demo" width="688" />
|
||||||
|
</p>
|
||||||
|
|
||||||
The remote diffing algorithm is based on CDC. In our tests, it is up to 30x
|
The remote diffing algorithm is based on CDC. In our tests, it is up to 30x
|
||||||
faster than the one used in rsync (1500 MB/s vs 50 MB/s).
|
faster than the one used in rsync (1500 MB/s vs 50 MB/s).
|
||||||
|
|
||||||
|
The following chart shows a comparison of `cdc_rsync` and Linux rsync running
|
||||||
|
under Cygwin on Windows. The test data consists of 58 development builds
|
||||||
|
of some game provided to us for evaluation purposes. The builds are 40-45 GB
|
||||||
|
large. For this experiment, we uploaded the first build, then synced the second
|
||||||
|
build with each of the two tools and measured the time. For example, syncing
|
||||||
|
from build 1 to build 2 took 210 seconds with the Linux rsync, but only 75
|
||||||
|
seconds with `cdc_rsync`. The three outliers are probably feature drops from
|
||||||
|
another development branch, where the delta was much higher. Overall,
|
||||||
|
`cdc_rsync` syncs files about **3 times faster** than Linux rsync.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="docs/cdc_rsync_vs_cygwin_rsync.png" alt="Comparison of cdc_rsync and Linux rsync running in Cygwin" width="753" />
|
||||||
|
</p>
|
||||||
|
|
||||||
## CDC Stream
|
## CDC Stream
|
||||||
|
|
||||||
CDC Stream is a tool to stream files and directories from a Windows machine to a
|
`cdc_stream` is a tool to stream files and directories from a Windows machine to a
|
||||||
Linux device. Conceptually, it is similar to [sshfs](https://github.com/libfuse/sshfs),
|
Linux device. Conceptually, it is similar to [sshfs](https://github.com/libfuse/sshfs),
|
||||||
but it is optimized for read speed.
|
but it is optimized for read speed.
|
||||||
* It caches streamed data on the Linux device.
|
* It caches streamed data on the Linux device.
|
||||||
* If a file is re-read on Linux after it changed on Windows, only the
|
* If a file is re-read on Linux after it changed on Windows, only the
|
||||||
differences are streamed again. The rest is read from cache.
|
differences are streamed again. The rest is read from the cache.
|
||||||
* Stat operations are very fast since the directory metadata (filenames,
|
* Stat operations are very fast since the directory metadata (filenames,
|
||||||
permissions etc.) is provided in a streaming-friendly way.
|
permissions etc.) is provided in a streaming-friendly way.
|
||||||
|
|
||||||
To efficiently determine which parts of a file changed, the tool uses the same
|
To efficiently determine which parts of a file changed, the tool uses the same
|
||||||
CDC-based diffing algorithm as CDC RSync. Changes to Windows files are almost
|
CDC-based diffing algorithm as `cdc_rsync`. Changes to Windows files are almost
|
||||||
immediately reflected on Linux, with a delay of roughly (0.5s + 0.7s x total
|
immediately reflected on Linux, with a delay of roughly (0.5s + 0.7s x total
|
||||||
size of changed files in GB).
|
size of changed files in GB).
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="docs/cdc_stream_demo.gif" alt="cdc_stream demo" width="688" />
|
||||||
|
</p>
|
||||||
|
|
||||||
The tool does not support writing files back from Linux to Windows; the Linux
|
The tool does not support writing files back from Linux to Windows; the Linux
|
||||||
directory is readonly.
|
directory is readonly.
|
||||||
|
|
||||||
|
The following chart compares times from starting a game to reaching the menu.
|
||||||
|
In one case, the game is streamed via `sshfs`, in the other case we use
|
||||||
|
`cdc_stream`. Overall, we see a **2x to 5x speedup**.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="docs/cdc_stream_vs_sshfs.png" alt="Comparison of cdc_stream and sshfs" width="752" />
|
||||||
|
</p>
|
||||||
|
|
||||||
# Getting Started
|
# Getting Started
|
||||||
|
|
||||||
The project has to be built both on Windows and Linux.
|
Download the precompiled binaries from the
|
||||||
|
[latest release](https://github.com/google/cdc-file-transfer/releases).
|
||||||
|
We currently provide Linux binaries compiled on
|
||||||
|
[Github's latest Ubuntu](https://github.com/actions/runner-images) version.
|
||||||
|
If the binaries work for you, you can skip the following two sections.
|
||||||
|
|
||||||
|
Alternatively, the project can be built from source. Some binaries have to be
|
||||||
|
built on Windows, some on Linux.
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
The following steps have to be executed on **both Windows and Linux**.
|
To build the tools from source, the following steps have to be executed on
|
||||||
|
**both Windows and Linux**.
|
||||||
|
|
||||||
* Download and install Bazel from https://bazel.build/install.
|
* Download and install Bazel from [here](https://bazel.build/install). See
|
||||||
|
[workflow logs](https://github.com/google/cdc-file-transfer/actions) for the
|
||||||
|
currently used version.
|
||||||
* Clone the repository.
|
* Clone the repository.
|
||||||
```
|
```
|
||||||
git clone https://github.com/google/cdc-file-transfer
|
git clone https://github.com/google/cdc-file-transfer
|
||||||
@@ -64,15 +121,15 @@ The file transfer tools require `ssh.exe` and `scp.exe`.
|
|||||||
|
|
||||||
The two tools can be built and used independently.
|
The two tools can be built and used independently.
|
||||||
|
|
||||||
### CDC Sync
|
### CDC RSync
|
||||||
|
|
||||||
* Build Linux components
|
* Build Linux components
|
||||||
```
|
```
|
||||||
bazel build --config linux --compilation_mode=opt //cdc_rsync_server
|
bazel build --config linux --compilation_mode=opt --linkopt=-Wl,--strip-all --copt=-fdata-sections --copt=-ffunction-sections --linkopt=-Wl,--gc-sections //cdc_rsync_server
|
||||||
```
|
```
|
||||||
* Build Windows components
|
* Build Windows components
|
||||||
```
|
```
|
||||||
bazel build --config windows --compilation_mode=opt //cdc_rsync
|
bazel build --config windows --compilation_mode=opt --copt=/GL //cdc_rsync
|
||||||
```
|
```
|
||||||
* Copy the Linux build output file `cdc_rsync_server` from
|
* Copy the Linux build output file `cdc_rsync_server` from
|
||||||
`bazel-bin/cdc_rsync_server` on the Linux system to `bazel-bin\cdc_rsync`
|
`bazel-bin/cdc_rsync_server` on the Linux system to `bazel-bin\cdc_rsync`
|
||||||
@@ -82,11 +139,11 @@ The two tools can be built and used independently.
|
|||||||
|
|
||||||
* Build Linux components
|
* Build Linux components
|
||||||
```
|
```
|
||||||
bazel build --config linux --compilation_mode=opt //cdc_fuse_fs
|
bazel build --config linux --compilation_mode=opt --linkopt=-Wl,--strip-all --copt=-fdata-sections --copt=-ffunction-sections --linkopt=-Wl,--gc-sections //cdc_fuse_fs
|
||||||
```
|
```
|
||||||
* Build Windows components
|
* Build Windows components
|
||||||
```
|
```
|
||||||
bazel build --config windows --compilation_mode=opt //asset_stream_manager
|
bazel build --config windows --compilation_mode=opt --copt=/GL //asset_stream_manager
|
||||||
```
|
```
|
||||||
* Copy the Linux build output files `cdc_fuse_fs` and `libfuse.so` from
|
* Copy the Linux build output files `cdc_fuse_fs` and `libfuse.so` from
|
||||||
`bazel-bin/cdc_fuse_fs` on the Linux system to `bazel-bin\asset_stream_manager`
|
`bazel-bin/cdc_fuse_fs` on the Linux system to `bazel-bin\asset_stream_manager`
|
||||||
@@ -94,25 +151,101 @@ The two tools can be built and used independently.
|
|||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
### CDC Sync
|
The tools require a setup where you can use SSH and SCP from the Windows machine
|
||||||
To copy the contents of the Windows directory `C:\path\to\assets` to `~/assets`
|
to the Linux device without entering a password, e.g. by using key-based
|
||||||
on the Linux device `linux.machine.com`, run
|
authentication.
|
||||||
```
|
|
||||||
cdc_rsync --ssh-command=C:\path\to\ssh.exe --scp-command=C:\path\to\scp.exe C:\path\to\assets\* user@linux.machine.com:~/assets -vr
|
|
||||||
```
|
|
||||||
Depending on your setup, you may have to specify additional arguments for the
|
|
||||||
ssh and scp commands, including proper quoting, e.g.
|
|
||||||
```
|
|
||||||
cdc_rsync --ssh-command="\"C:\path with space\to\ssh.exe\" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile=\"\"\"known_hosts_file\"\"\"" --scp-command="\"C:\path with space\to\scp.exe\" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile=\"\"\"known_hosts_file\"\"\"" C:\path\to\assets\* user@linux.machine.com:~/assets -vr
|
|
||||||
```
|
|
||||||
Lengthy ssh/scp commands that rarely change can also be put into environment
|
|
||||||
variables `CDC_SSH_COMMAND` and `CDC_SCP_COMMAND`, e.g.
|
|
||||||
```
|
|
||||||
set CDC_SSH_COMMAND="C:\path with space\to\ssh.exe" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""known_hosts_file"""
|
|
||||||
|
|
||||||
set CDC_SCP_COMMAND="C:\path with space\to\scp.exe" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""known_hosts_file"""
|
### Configuring SSH and SCP
|
||||||
|
|
||||||
cdc_rsync C:\path\to\assets\* user@linux.machine.com:~/assets -vr
|
By default, the tools search `ssh.exe` and `scp.exe` from the path environment
|
||||||
|
variable. If you can run the following commands in a Windows cmd without
|
||||||
|
entering your password, you are all set:
|
||||||
|
```
|
||||||
|
ssh user@linux.device.com
|
||||||
|
scp somefile.txt user@linux.device.com:
|
||||||
|
```
|
||||||
|
Here, `user` is the Linux user and `linux.device.com` is the Linux host to
|
||||||
|
SSH into or copy the file to.
|
||||||
|
|
||||||
|
If `ssh.exe` or `scp.exe` cannot be found, or if additional arguments are
|
||||||
|
required, it is recommended to set the environment variables `CDC_SSH_COMMAND`
|
||||||
|
and `CDC_SCP_COMMAND`. The following example specifies a custom path to the SSH
|
||||||
|
and SCP binaries, a custom SSH config file, a key file and a known hosts file:
|
||||||
|
```
|
||||||
|
set CDC_SSH_COMMAND="C:\path with space\to\ssh.exe" -F C:\path\to\ssh_config -i C:\path\to\id_rsa -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""C:\path\to\known_hosts"""
|
||||||
|
set CDC_SCP_COMMAND="C:\path with space\to\scp.exe" -F C:\path\to\ssh_config -i C:\path\to\id_rsa -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""C:\path\to\known_hosts"""
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Google Specific
|
||||||
|
|
||||||
|
For Google internal usage, set the following environment variables to enable SSH
|
||||||
|
authentication using a Google security key:
|
||||||
|
```
|
||||||
|
set CDC_SSH_COMMAND=C:\gnubby\bin\ssh.exe
|
||||||
|
set CDC_SCP_COMMAND=C:\gnubby\bin\scp.exe
|
||||||
|
```
|
||||||
|
Note that you will have to touch the security key multiple times during the
|
||||||
|
first run. Subsequent runs only require a single touch.
|
||||||
|
|
||||||
|
### CDC RSync
|
||||||
|
|
||||||
|
`cdc_rsync` is used similar to `scp` or the Linux `rsync` command. To sync a
|
||||||
|
single Windows file `C:\path\to\file.txt` to the home directory `~` on the Linux
|
||||||
|
device `linux.device.com`, run
|
||||||
|
```
|
||||||
|
cdc_rsync C:\path\to\file.txt user@linux.device.com:~
|
||||||
|
```
|
||||||
|
`cdc_rsync` understands the usual Windows wildcards `*` and `?`.
|
||||||
|
```
|
||||||
|
cdc_rsync C:\path\to\*.txt user@linux.device.com:~
|
||||||
|
```
|
||||||
|
To sync the contents of the Windows directory `C:\path\to\assets` recursively to
|
||||||
|
`~/assets` on the Linux device, run
|
||||||
|
```
|
||||||
|
cdc_rsync C:\path\to\assets\* user@linux.device.com:~/assets -r
|
||||||
|
```
|
||||||
|
To get per file progress, add `-v`:
|
||||||
|
```
|
||||||
|
cdc_rsync C:\path\to\assets\* user@linux.device.com:~/assets -vr
|
||||||
```
|
```
|
||||||
|
|
||||||
### CDC Stream
|
### CDC Stream
|
||||||
|
|
||||||
|
`cdc_stream` consists of a background service called `asset_stream_manager`,
|
||||||
|
which has to be started in advance with
|
||||||
|
```
|
||||||
|
asset_stream_manager
|
||||||
|
```
|
||||||
|
The service logs to `%APPDATA%\cdc-file-transfer\logs` by default. Try
|
||||||
|
`asset_stream_manager --helpfull` to get a list of available flags.
|
||||||
|
|
||||||
|
To stream the Windows directory `C:\path\to\assets` to `~/assets` on the Linux
|
||||||
|
device, run
|
||||||
|
```
|
||||||
|
cdc_stream start C:\path\to\assets user@linux.device.com:~/assets
|
||||||
|
```
|
||||||
|
This makes all files and directories of `C:\path\to\assets` available on
|
||||||
|
`~/assets` immediately, as if it were a local copy. However, data is streamed
|
||||||
|
from Windows to Linux as files are accessed.
|
||||||
|
|
||||||
|
To stop the streaming session, enter
|
||||||
|
```
|
||||||
|
cdc_stream stop user@linux.device.com:~/assets
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
`cdc_rsync` always logs to the console. By default, the `asset_stream_manager`
|
||||||
|
service logs to a timestamped file in `%APPDATA%\cdc-file-transfer\logs`. It can
|
||||||
|
be switched to log to console by starting it with `--log_to_stdout`:
|
||||||
|
```
|
||||||
|
asset_stream_manager --log_to_stdout
|
||||||
|
```
|
||||||
|
|
||||||
|
Both `cdc_rsync` and `asset_stream_manager` support command line flags to control log
|
||||||
|
verbosity. Passing `-vvv` prints debug logs, `-vvvv` prints verbose logs. The
|
||||||
|
debug logs contain all SSH and SCP commands that are attempted to run, which is
|
||||||
|
very useful for troubleshooting.
|
||||||
|
|
||||||
|
`cdc_stream` is just a thin client for the asset streaming service. Nothing ever
|
||||||
|
goes wrong with it <sup>[citation needed]</sup>.
|
||||||
|
|||||||
BIN
docs/cdc_rsync_recursive_upload_demo.gif
Normal file
BIN
docs/cdc_rsync_recursive_upload_demo.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 8.3 MiB |
BIN
docs/cdc_rsync_vs_cygwin_rsync.png
Normal file
BIN
docs/cdc_rsync_vs_cygwin_rsync.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 60 KiB |
BIN
docs/cdc_stream_demo.gif
Normal file
BIN
docs/cdc_stream_demo.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 12 MiB |
BIN
docs/cdc_stream_vs_sshfs.png
Normal file
BIN
docs/cdc_stream_vs_sshfs.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 32 KiB |
Reference in New Issue
Block a user