ljusten 9fdccb3548 Remove GGP dependencies from CDC RSync (#1)
* Remove dependencies of cdc_sync from GGP

Allows overriding the SSH and SCP commands via command line flags.
Hence, strict host checking, SSH config etc. can be removed since it
is passed in by command line flags for GGP. Also deploys
cdc_rsync_server to ~/.cache/cdc_file_transfer/ and creates that dir
if it does not exist.

* Tweak RemoteUtil

Replaces localhost: by //./ in the workaround for scp since localhost:
had two disadvantages: 1) It required 2 gnubby touches for gLinux and
2) it didn't work for ggp. //./ works for both. Also tweaks quoting,
which didn't quite work for ggp.

* Don't check remote ports in cdc_rsync

Turns off checking remote ports in PortManager. In the future, the
server should return available ports after failing to connect to the
provided port.

Since now the first remote connection is running cdc_rsync_server,
the timeout check has to be done when running that process.

* Remove now-unused kInstancePickerNotAvailableInQuietMode enum

* Add more details to the readme

* [cdc_rsync] Accept [user@]host:destination

Removes the --ip command line argument and assumes user/host are
passed in along with the destination, so it works in the same way as
other popular tools.

* [ggp_rsync] Combine server deploy commands

Combines two chmod and one mv command into one ssh command. This makes
deploy a bit quicker, especially if each ssh command involves touching
your gnubby.

* Remove GGP specific stuff from VS build commands

* [cdc_rsync] Get rid of cdc_rsync.dll

Compile the CDC RSync client as a static library instead. This removes
quite a bit of boiler plate and makes string handling easier since
we can now pass std::strings instead of const chars.

Also fixes an issue where we were sometimes trying to assign nullptr
to std::strings, which is forbidden.

* Allow specifying ssh/scp commands with env vars

* Rename GgpRsync* to CdcRsync*

* Merge ggp_rsync_cli into ggp_rsync

* [cdc_rsync] Refactor cdc_rsync.cc/h

Merges cdc_rsync.cc/h with main.cc and CdcRsyncClient since code is
closer to where it's being used and should be more readable.
2022-11-15 12:48:09 +01:00

CDC File Transfer

This repository contains tools for synching and streaming files. They are based on Content Defined Chunking (CDC), in particular FastCDC, to split up files into chunks.

CDC RSync

CDC RSync is a tool to sync files from a Windows machine to a Linux device, similar to the standard Linux rsync. It is basically a copy tool, but optimized for the case where there is already an old version of the files available in the target directory.

  • It skips files quickly if timestamp and file size match.
  • It uses fast compression for all data transfer.
  • If a file changed, it determines which parts changed and only transfers the differences.

The remote diffing algorithm is based on CDC. In our tests, it is up to 30x faster than the one used in rsync (1500 MB/s vs 50 MB/s).

CDC Stream

CDC Stream is a tool to stream files and directories from a Windows machine to a Linux device. Conceptually, it is similar to sshfs, but it is optimized for read speed.

  • It caches streamed data on the Linux device.
  • If a file is re-read on Linux after it changed on Windows, only the differences are streamed again. The rest is read from cache.
  • Stat operations are very fast since the directory metadata (filenames, permissions etc.) is provided in a streaming-friendly way.

To efficiently determine which parts of a file changed, the tool uses the same CDC-based diffing algorithm as CDC RSync. Changes to Windows files are almost immediately reflected on Linux, with a delay of roughly (0.5s + 0.7s x total size of changed files in GB).

The tool does not support writing files back from Linux to Windows; the Linux directory is readonly.

Getting Started

The project has to be built both on Windows and Linux.

Prerequisites

The following steps have to be executed on both Windows and Linux.

  • Download and install Bazel from https://bazel.build/install.
  • Clone the repository.
    git clone https://github.com/google/cdc-file-transfer
    
  • Initialize submodules.
    cd cdc-file-transfer
    git submodule update --init --recursive
    

Finally, install an SSH client on the Windows device if not present. The file transfer tools require ssh.exe and scp.exe.

Building

The two tools can be built and used independently.

CDC Sync

  • Build Linux components
    bazel build --config linux --compilation_mode=opt //cdc_rsync_server
    
  • Build Windows components
    bazel build --config windows --compilation_mode=opt //cdc_rsync
    
  • Copy the Linux build output file cdc_rsync_server from bazel-bin/cdc_rsync_server on the Linux system to bazel-bin\cdc_rsync on the Windows machine.

CDC Stream

  • Build Linux components
    bazel build --config linux --compilation_mode=opt //cdc_fuse_fs
    
  • Build Windows components
    bazel build --config windows --compilation_mode=opt //asset_stream_manager
    
  • Copy the Linux build output files cdc_fuse_fs and libfuse.so from bazel-bin/cdc_fuse_fs on the Linux system to bazel-bin\asset_stream_manager on the Windows machine.

Usage

CDC Sync

To copy the contents of the Windows directory C:\path\to\assets to ~/assets on the Linux device linux.machine.com, run

cdc_rsync --ssh-command=C:\path\to\ssh.exe --scp-command=C:\path\to\scp.exe C:\path\to\assets\* user@linux.machine.com:~/assets -vr

Depending on your setup, you may have to specify additional arguments for the ssh and scp commands, including proper quoting, e.g.

cdc_rsync --ssh-command="\"C:\path with space\to\ssh.exe\" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile=\"\"\"known_hosts_file\"\"\"" --scp-command="\"C:\path with space\to\scp.exe\" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile=\"\"\"known_hosts_file\"\"\"" C:\path\to\assets\* user@linux.machine.com:~/assets -vr

Lengthy ssh/scp commands that rarely change can also be put into environment variables CDC_SSH_COMMAND and CDC_SCP_COMMAND, e.g.

set CDC_SSH_COMMAND="C:\path with space\to\ssh.exe" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""known_hosts_file"""

set CDC_SCP_COMMAND="C:\path with space\to\scp.exe" -F ssh_config_file -i id_rsa_file -oStrictHostKeyChecking=yes -oUserKnownHostsFile="""known_hosts_file"""

cdc_rsync C:\path\to\assets\* user@linux.machine.com:~/assets -vr

CDC Stream

Description
Tools for synching and streaming files from Windows to Linux
Readme 24 MiB
Languages
C++ 89.8%
Python 7%
Starlark 3.1%