Commit Graph

46 Commits

Author SHA1 Message Date
Lutz Justen
f8c10ce7bd [cdc_rsync] Enable local syncing (#75)
Adds support for local syncs of files and folders on the same Windows
machine, e.g. cdc_rsync C:\source C:\dest. The two main changes are

- Skip the check whether the port is available remotely with PortManager.
- Do not deploy cdc_rsync_server.
- Run cdc_rsync_server directly, not through an SSH tunnel.

The current implementation is not optimal as it starts
cdc_rsync_server as a separate process and communicates to it via a
TCP port.
2023-01-26 09:57:19 +01:00
Donovan Baarda
9cf71cae65 Fix #76 fastcdc chunk boundary off-by-one. (#78)
* Fix #76 fastcdc chunk boundary off-by-one.

This ensures that the last byte included in the gear-hash that identified the
chunk boundary is included in the chunk. This ensures chunks are still matched
when the byte immediately after them is changed.

* Init gear hash to all 1's to prevent zero-length chunks with min_size=0.

Also change the `MaxChunkSize` test to use min_size=0 to test this works.
2023-01-23 14:39:02 +01:00
Lutz Justen
efca9855e7 [cdc_rsync] [cdc_stream] Switch from scp to sftp (#66)
Use sftp for deploying remote components instead of scp. sftp has the
advantage that it can also create directries, chmod files etc., so
that we can do everything in one call of sftp instead of mixing scp
and ssh calls.

The downside of sftp is that it can't switch to ~ resp. %userprofile%
for the remote side, and we have to assume that sftp starts in the
user's home dir. This is the default and works on my machines!

cdc_rsync and cdc_stream check the CDC_SFTP_COMMAND env var now and
accept --sftp-command flags. If they are not set, the corresponding
scp flag and env var is still used, with scp replaced by sftp. This is
most likely correct as sftp and scp usually reside in the same
directory and share largely identical parameters.
2023-01-18 17:49:52 +01:00
Lutz Justen
a8b948b323 [cdc_rsync] Add initial support for Windows (#51)
Adds a ServerArch class whose job it is to encapsulate differences
between Windows and Linux cdc_rsync_servers. It detects the type
based on a heuristic in the destination path. This is not fool proof
and will probably require further work, like falling back to the other
type if the detected one doesn't work.

Uses the ServerArch class to determine the different commands to start
the server and to deploy the server.

Note that the functionality is not well tested on Windows yet, but
copying plain files works.
2023-01-17 13:34:14 +01:00
Lutz Justen
af9038b4dd [RemoteUtil] Add support for sftp (#64)
In a future CL, we will switch from scp to sftp. This CL adds support
for calling sftp from RemoteUtil.

In order to maintain backwards compatibility where people still set
--scp-command or CDC_SCP_COMMAND instead of the sftp versions, this CL
also adds the helper method RemoteUtil::ScpToSftpCommand, which
attempts to convert an scp command to an sftp command. This is usually
possible since the args are almost the same. For instance, if the scp
command is
  C:\path\to\scp.exe -P 1234 -i <key_file> -oUserKnownHostsFile=known_hosts
then the corresponding sftp command is most likely
  C:\path\to\sftp.exe -P 1234 -i <key_file> -oUserKnownHostsFile=known_hosts
This works for instance for OpenSSH.
2023-01-17 12:05:17 +01:00
Lutz Justen
f2177969fe [common] Add a way to set the process startup directory (#63)
This will be needed later for switching to sftp, since calling lcd in
sftp is tricky to get right (e.g. may or may not require /cygwin/c on
Windows, depending on whether sftp is native or not).
2023-01-16 12:26:45 +01:00
Lutz Justen
42f5ee9b44 [cdc_rsync] Fix issue in UnzstdStream (#59)
Fixes an issue in UnzstdStream where the Read() method always tries to
read new input data if no input data is available, instead of first
trying to uncompress. Since zstd maintains internal buffers,
uncompression might succeed even without reading more input, so this
is faster. This bug can lead to pipeline stalls in cdc_rsync.
2023-01-10 13:09:14 +01:00
Timo
14b750f674 Fix typo: because to this -> because of this (#57) 2023-01-09 17:58:43 +01:00
Lutz Justen
8c6deaac90 [common] Fix FileWatcherTest once and for all (#53)
But...

ONCE AND FOR ALL!

A recent change introduced WaitForWatching(), which was supposed to
block until the file watcher is actively monitoring the directory.
However this always returned immediately since the watcher is in
kFailed state if the directory was deleted, which counts as watching
(IsStarted returns true for both kWatching and kFailed states).

This CL adds an IsWatching() helper function that returns true only for
the kWatching state, which means that the directory is actively being
watched.
2023-01-09 17:56:47 +01:00
Ayush
edd0ab023b Fix typo synching -> syncing (#58) 2023-01-09 13:17:58 +01:00
Lutz Justen
9f8a7d21e6 [cdc_rsync] Improve README (#50)
Adds more info about how cdc_rsync works and why it's faster.

Fixes #49
2022-12-21 11:23:25 +01:00
Lutz Justen
a138fb55c4 [cdc_rsync] Add support for ServerSocket on Windows (#48)
Makes ServerSocket multi-platform, mainly by working around some small
API differences. The code is largely the same, there should be no
differences on Linux.

Also moves WSAStartup() and WSACleanup() up to the Socket level as
static methods because it's used by both ClientSocket and ServerSocket,
and because it doesn't make sense to do that in the socket class as
that would prevent one from using several sockets.
2022-12-19 23:02:36 +01:00
Lutz Justen
d8c2b5906e [cdc_stream] [cdc_rsync] Add --forward-port flag (#45)
Adds a flag to set the SSH forwarding port or port range used for
'cdc_stream start-service' and 'cdc_rsync'.

If a single number is passed, e.g. --forward-port 12345, then this
port is used without checking availability of local and remote ports.
If the port is taken, this results in an error when trying to connect.
Note that this restricts the number of connections that stream can
make to one.

If a range is passed, e.g. --forward-port 45000-46000, the tools
search for available ports locally and remotely in that range. This is
more robust, but a bit slower due to the extra overhead.

Optimizes port_manager_win as it was very slow for a large port range.
It's still not optimal, but the time needed to scan 30k ports is
<< 1 seconds now.

Fixes #12
2022-12-19 10:04:36 +01:00
Lutz Justen
f8438aec66 [cdc_rsync] [cdc_stream] Remove SSH port argument (#41)
This CL removes the port arguments for both tools.

The port argument can also be specified via the ssh-command and
scp-command flags. In fact, if a port is specified by both port flags
and ssh/scp commands, they interfere with each other. For ssh, the one
specified in ssh-command wins. For scp, the one specified in
scp-command wins. To fix this, one would have to parse scp-command and
remove the port arg there. Or we could just remove the ssh-port arg.
This is what this CL does. Note that if you need a custom port, it's
very likely that you also have to define custom ssh and scp commands.
2022-12-12 10:58:33 +01:00
Lutz Justen
f0ef34db2f [cdc_stream] Add integration tests (#44)
This CL adds Python integration tests for cdc_stream. To run the
tests, you need to supply a Linux host and proper configuration for
cdc_stream to work:

set CDC_SSH_COMMAND=C:\path\to\ssh.exe <args>
set CDC_SCP_COMMAND=C:\path\to\scp.exe <args>
C:\python38\python.exe -m integration_tests.cdc_stream.all_tests --binary_path=C:\full\path\to\cdc_stream.exe --user_host=user@host

Ran the tests and made sure they worked.
2022-12-08 15:12:14 +01:00
Lutz Justen
668c2ca8df [cdc_rsync] Add integration tests (#42)
[cdc_rsync] Add integration tests

This CL adds Python integration tests for cdc_rsync. To run the tests,
you need to supply a Linux host and proper configuration for cdc_rsync
to work:

  set CDC_SSH_COMMAND=C:\path\to\ssh.exe <args>
  set CDC_SCP_COMMAND=C:\path\to\scp.exe <args>
  C:\python38\python.exe -m integration_tests.cdc_rsync.all_tests --binary_path=C:\full\path\to\cdc_rsync.exe --user_host=user@host

Ran the tests and made sure they worked.
2022-12-08 08:39:43 +01:00
Lutz Justen
d2b594a41d Fix build caching (#43)
There were two problems:
- Writing the date on Windows used the wrong syntax. In Powershell,
  env variables are addressed as $env:NAME, not $NAME.
- Use different caches for opt vs fastbuild. We are currently using
  opt caches for fastbuilds, which results in lots of cache misses.
2022-12-08 08:38:23 +01:00
Lutz Justen
c21503d21b [cdc_stream] Fix issues found in tests (#40)
* [cdc_stream] Fix issues found in tests

Fixes a couple of issues found by integration testing:
- Unicode command line args in cdc_stream show up as question marks.
- Log is still named assets_stream_manager instead of cdc_stream.
- An error message contains stadia_assets_stream_manager_v3.exe.
- mount_dir was not the last arg as required by FUSE
- Promoted cache cleanup logs to INFO level since they're important
  for the proper workings of the system.
- Asset streaming cache dir is still %APPDATA%\GGP\asset_streaming.

* Address comments
2022-12-07 11:25:43 +01:00
Lutz Justen
c9e18b9e91 Reuse bazel cache folder between builds (#38)
Uses a bazel --disk_cache to cache build outputs between builds. Bazel
also has a local cache, e.g. in ~/.cache/bazel/_bazel_$USER/cache, but
that one can't be used as it won't reuse data across checkouts. A disk
cache is like a remote cache, except that it's on the local disk.

Github first looks for a cache with the given exact key in the current
branch, then in the main branch. If there's a cache hit, the cache
isn't updated (they're read-only!). To prevent that caches become
stale, they are timestamped using the current year and month, so that
the cache is force-renewed every month. Bazel disk caches also just
grow, so this technique prevents that the cache grows indefinitely,
eventually causing cache trashing.
2022-12-05 10:46:56 +01:00
Lutz Justen
6c48f939fc Do not quote ssh/scp commands (#35)
This prevents adding args to the commands, e.g.
set CDC_SCP_COMMAND=C:\path\to\scp.exe -i id_rsa.
2022-12-05 10:46:18 +01:00
Lutz Justen
1b8ad0e097 [cdc_stream] Add wildcard support to stop command (#30)
Adds support for stuff like cdc_stream stop * or cdc_stream stop user*:dir*.
2022-12-05 10:09:37 +01:00
Lutz Justen
90717ce670 [cdc_stream] Implement stop-service command (#29)
Implements cdc_stream stop-service. Also fixes an issue in the
BackgroundService implementation where Exit() would deadlock since
server shutdown waits for all RPCs to exit.
2022-12-02 19:39:13 +01:00
Lutz Justen
1120dcbee0 [cdc_stream] Automatically start service (#28)
Starts the streaming service if it's not up and running. This required
adding the ability to run a detached process. By default, all child
processes are killed when the parent process exits. Since detached
child processes don't run with a console, they need to create sub-
processes with CREATE_NO_WINDOW since otherwise a new console pops up,
e.g. for every ssh command.

Polls for 20 seconds while the service starts up. For this purpose,
a BackgroundServiceClient is added. This will be reused in a future CL
by a new stop-service command to exit the service.

Also adds --service-port as additional argument to start-service.
2022-12-02 14:34:36 +01:00
Lutz Justen
6d63aa72d7 [common] Fix FileWatcherTest (#37)
There is a race condition in RecreateWatchedDir where there was a
brief period between the second dir change event and when the file
watcher was actually watching again. If the file was written during
that bried period, it would be missed. The issue could be reproduced
easily by adding a sleep here:

  // The watched directory exists and its handle is valid.
  if (!first_run) {
    ++dir_recreate_count_;
    if (dir_recreated_cb_) dir_recreated_cb_();
    Util::Sleep(1);
  }

This CL waits until the watcher is watching again.
2022-12-02 13:03:20 +01:00
Lutz Justen
01a60e2490 Rename asset_stream_manager to cdc_stream (#27)
Fixes #13
2022-12-01 16:14:56 +01:00
Lutz Justen
f0539226a2 Merge cdc_stream into asset_stream_manager (#26)
Adds start and stop commands to asset_stream_manager.
asset_stream_manager will be renamed to cdc_stream next.
2022-12-01 12:40:00 +01:00
Lutz Justen
876e59409f Add linter workflow (#33) 2022-12-01 10:38:14 +01:00
Lutz Justen
a381541d1b [cdc_stream] Switch asset_stream_manager to use Lyra (#25)
Switch asset_stream_manager to use Lyra

Lyra has a nice simple interface, but a few quirks that we work
around, mainly in the BaseCommand class:
- It does not support return values from running a command.
- It does not support return values from a custom arg parser.
- Lyra interprets --bad_arg as positional argument.

Fixes #15
2022-12-01 10:36:48 +01:00
Lutz Justen
7d7fcc67b9 Optimize gifs (#32)
Reduces the size of the GIFs from 20ish MB to 3 MB with only minimal quality reduction. Also removes some unwanted frames.
2022-11-30 09:48:29 +01:00
Lutz Justen
491be234c6 "Proper" fix for "Input redirection is not supported" issue with timeout (#31)
The issue was consistently reproducible by adding a sleep right after starting the process.
Use ping instead of timeout now, because ping doesn't read user input.
2022-11-30 09:27:57 +01:00
Lutz Justen
b0f5403854 Fix possible indefinite loop in FileWatcherTest (#23)
The test
  FileWatcherTest/FileWatcherParameterizedTest.RecreateWatchedDir/ReadDirectoryChangesExW
is flaky. This CL doesn't fix the root cause, but it fixes the
indefinite spin in GetChangedFiles when there is no file change.
2022-11-28 11:04:02 +01:00
Lutz Justen
a97d16c4e9 Fix include paths in vcxproj files (#24)
This fixes some issues with Intellisense.
2022-11-28 11:03:35 +01:00
Lutz Justen
8c4a0465e9 Expand path variables for sync destination (#18)
Expand path variables for sync destination

Running commands like cdc_rsync C:\assets\* host:~/assets -vr would create a directory called ~assets. This CL expands path variables properly.
2022-11-25 14:21:21 +01:00
Lutz Justen
991f61cc4d Improve create_release workflow (#20)
Modifies the create_release workflow in 2 ways:
- It only runs now if something is pushed to main.
- It creates a tagged release if a tag is pushed.

To create a tagged release, run e.g.
  git tag -a v0.1.0 -m "Release 0.1.0"
  git push origin v0.1.0
2022-11-25 12:02:54 +01:00
Lutz Justen
9d7eee35bd Convert src dir to full path (#21)
The asset stream manager requires a full path. With this CL, you can stream from e.g. ".".
2022-11-25 12:01:29 +01:00
Lutz Justen
fac559b1be Improve readme (#19)
Improve readme

This CL adds
- a history section with references to Stadia
- benchmarks
- animated gifs with demos
- a troubleshooting section
- and more info about cdc_stream
2022-11-24 13:33:29 +01:00
Lutz Justen
21a1b37787 Add workflow to create a release (#11)
Adds a workflow that creates a "latest" release with a description listing all changes and a zip with built binaries (for Windows and Ubuntu 20.04).
2022-11-22 12:14:14 +01:00
wurwunchik
b2c011cc0d Add new flags to asset stream manager (#10)
- Add --config-file option defining Json configuration file for asset stream manager
- Add log_dir flag for log folder
- Remove unused functions from SdkUtils
- Fix build issue in cdc_fuse_fs
2022-11-22 12:05:48 +01:00
Lutz Justen
0252d51cc0 Add actions for building and testing (#8)
* Add a Github action for building and testing

On Windows, -- -//third_party/... doesn't seem to work, so add all test directories manually. Also run the tests_*. We run only fastbuild tests here, since the opt tests will be run in the release workflow.

Also fix a number of compilation and test issues found along the way.
2022-11-21 23:22:09 +01:00
Lutz Justen
bccc025945 [cdc_stream] Append errors from netstart to status (#9)
So far, errors from the remote netstat process would only be logged in
the asset stream service, for instance when SSH auth failed. However,
the errors were not shown to the client, and that's the most important
thing.

Also adds some feedback to cdc_stream in case of success.
2022-11-21 09:07:05 +01:00
Lutz Justen
269fb2be45 [cdc_stream] Add a CLI client to start/stop asset streaming sessions (#4)
Implements the cdc_stream client and adjusts asset streaming in
various places to work better outside of a GGP environment.

This CL tries to get quoting for SSH commands right. It also brings
back the ability to start a streaming session from
asset_stream_manager.

Also cleans up Bazel targets setup. Since the sln file is now in root,
it is no longer necessary to prepend ../ to relative filenames to
make clicking on errors work.
2022-11-18 10:59:42 +01:00
ljusten
ca84d3dd2e [cdc_fuse_fs] Fix various issues (#6)
Fixes a couple of issues with the FUSE:
- Creates the mount directory if it does not exist.
  This assumes the mount dir to be the last arg. Ideally, we'd parse the
  command line and then create the directory, but unfortunately
  fuse_parse_cmdline already verifies that the dir exists.
- Expands the cache_dir (e.g. ~).
- Fixes a compile issue in manifest_iterator.
2022-11-17 14:01:59 +01:00
chrschng
76bbdb01bb Merge dynamic manifest updates to Github (#7)
This change introduces dynamic manifest updates to asset streaming.

Asset streaming describes the directory to be streamed in a manifest, which is a proto definition of all content metadata. This information is sufficient to answer `stat` and `readdir` calls in the FUSE layer without additional round-trips to the workstation.

When a directory is streamed for the first time, the corresponding manifest is created in two steps:
1. The directory is traversed recursively and the inode information of all contained files and directories is written to the manifest.
2. The content of all identified files is processed to generate each file's chunk list. This list is part of the definition of a file in the manifest.
  * The chunk boundaries are identified using our implementation of the FastCDC algorithm.
  * The hash of each chunk is calculated using the BLAKE3 hash function.
  * The length and hash of each chunk is appended to the file's chunk list.

Prior to this change, when the user mounted a workstation directory on a client, the asset streaming server pushed an intermediate manifest to the gamelet as soon as step 1 was completed. At this point, the FUSE client started serving the virtual file system and was ready to answer `stat` and `readdir` calls. In case the FUSE client received any call that required file contents, such as `read`, it would block the caller until the server completed step 2 above and pushed the final manifest to the client. This works well for large directories (> 100GB) with a reasonable number of files (< 100k). But when dealing with millions of tiny files, creating the full manifest can take several minutes.

With this change, we introduce dynamic manifest updates. When the FUSE layer receives an `open` or `readdir` request for a file or directory that is incomplete, it sends an RPC to the workstation about what information is missing from the manifest. The workstation identifies the corresponding file chunker or directory scanner tasks and moves them to the front of the queue. As soon as the task is completed, the workstation pushes an updated intermediate manifest to the client which now includes the information to serve the FUSE request. The queued FUSE request is resumed and returns the result to the caller.

While this does not reduce the required time to build the final manifest, it splits up the work into smaller tasks. This allows us to interrupt the current work and prioritize those tasks which are required to handle an incoming request from the client. While this still takes a round-trip to the workstation plus the processing time for the task, an updated manifest is received within a few seconds, which is much better than blocking for several minutes. 

This latency is only visible when serving data while the manifest is still being created. The situation improves as the manifest creation on the workstation progresses. As soon as the final manifest is pushed, all metadata can be served directly without having to wait for pending tasks.
2022-11-16 11:20:32 +01:00
ljusten
23fcd5ef1d Improve cdc_fuse_fs and path (#2)
Improve cdc_fuse_fs and path

Improves the error handling in path so that std:error_codes are not
assumed to be of system category, and also that their messages are
displayed. Also improves debug messages in GameletComponent.
2022-11-15 12:53:02 +01:00
ljusten
9fdccb3548 Remove GGP dependencies from CDC RSync (#1)
* Remove dependencies of cdc_sync from GGP

Allows overriding the SSH and SCP commands via command line flags.
Hence, strict host checking, SSH config etc. can be removed since it
is passed in by command line flags for GGP. Also deploys
cdc_rsync_server to ~/.cache/cdc_file_transfer/ and creates that dir
if it does not exist.

* Tweak RemoteUtil

Replaces localhost: by //./ in the workaround for scp since localhost:
had two disadvantages: 1) It required 2 gnubby touches for gLinux and
2) it didn't work for ggp. //./ works for both. Also tweaks quoting,
which didn't quite work for ggp.

* Don't check remote ports in cdc_rsync

Turns off checking remote ports in PortManager. In the future, the
server should return available ports after failing to connect to the
provided port.

Since now the first remote connection is running cdc_rsync_server,
the timeout check has to be done when running that process.

* Remove now-unused kInstancePickerNotAvailableInQuietMode enum

* Add more details to the readme

* [cdc_rsync] Accept [user@]host:destination

Removes the --ip command line argument and assumes user/host are
passed in along with the destination, so it works in the same way as
other popular tools.

* [ggp_rsync] Combine server deploy commands

Combines two chmod and one mv command into one ssh command. This makes
deploy a bit quicker, especially if each ssh command involves touching
your gnubby.

* Remove GGP specific stuff from VS build commands

* [cdc_rsync] Get rid of cdc_rsync.dll

Compile the CDC RSync client as a static library instead. This removes
quite a bit of boiler plate and makes string handling easier since
we can now pass std::strings instead of const chars.

Also fixes an issue where we were sometimes trying to assign nullptr
to std::strings, which is forbidden.

* Allow specifying ssh/scp commands with env vars

* Rename GgpRsync* to CdcRsync*

* Merge ggp_rsync_cli into ggp_rsync

* [cdc_rsync] Refactor cdc_rsync.cc/h

Merges cdc_rsync.cc/h with main.cc and CdcRsyncClient since code is
closer to where it's being used and should be more readable.
2022-11-15 12:48:09 +01:00
Christian Schneider
4326e972ac Releasing the former Stadia file transfer tools
The tools allow efficient and fast synchronization of large directory
trees from a Windows workstation to a Linux target machine.

cdc_rsync* support efficient copy of files by using content-defined
chunking (CDC) to identify chunks within files that can be reused.

asset_stream_manager + cdc_fuse_fs support efficient streaming of a
local directory to a remote virtual file system based on FUSE. It also
employs CDC to identify and reuse unchanged data chunks.
2022-11-03 10:39:10 +01:00