Commit Graph

63 Commits

Author SHA1 Message Date
Lutz Justen
c481b6a27f [common] Prevent command execution in ExpandPathVariables (#87)
Command execution is not something users would expect. Even though
there is no security issue (right now), it's probably better to turn
it off.
2023-03-06 15:25:49 +01:00
Lutz Justen
a8059e8572 [cdc_rsync] Use any available server port (#94)
Instead of calling netstat on the remote device to detect available
ports, simply call bind with port 0 to bind to any available port.
Since the port is not yet known when cdc_rsync_server.exe is called,
port forwarding needs to be started AFTER the server reports its port.
2023-03-06 14:16:21 +01:00
Lutz Justen
5fd86e4625 [cdc_rsync] Fix issue with IPV6 localhosts (#93)
Fixes an issue with port forwarding when localhost on the remote
system maps to the IPV6 localhost. In that case the server would time
out on accept() since it creates an IPV4 socket, so the connection
is never established.

Also allows passing in port 0, so that it will auto-detect an
available port. This will be used in a future CL to remove the
necessity of running netstat/ss.
2023-02-14 11:09:03 +01:00
Donovan Baarda
fcc4cbc3f3 Change fastcdc to a better and simpler algorithm. (#79)
This CL changes the chunking algorithm from "normalized chunking" to
simple "regression chunking", and changes the has criteria from
'hash&mask' to 'hash<=threshold'. These are all ideas taken from
testing and analysis done at
  https://github.com/dbaarda/rollsum-chunking/blob/master/RESULTS.rst
Regression chunking was introduced in
  https://www.usenix.org/system/files/conference/atc12/atc12-final293.pdf

The algorithm uses an arbitrary number of regressions using power-of-2
regression target lengths. This means we can use a simple bitmask for
the regression hash criteria.

Regression chunking yields high deduplication rates even for lower max
chunk sizes, so that the cdc_stream max chunk can be reduced to 512K
from 1024K. This fixes potential latency spikes from large chunks.
2023-02-08 15:06:41 +01:00
Lutz Justen
24906eb36e [RemoteUtil] Fix output from Windows SSH commands (#90)
Adds an ArchType argument to many RemoteUtil methods, which is used to
replace -tt (forced pseudo-TTY allocation) by -T (no pseudo-TTY
allocation). The -tt option adds tons of ANSI escape sequences to the
output and makes it unparsable, even after removing the sequences, as
some sequences like "delete the last X characters" are not honoured.

An exception is BuildProcessStartInfoForSshPortForward, where
replacing -tt by -T would make the port forwarding process exit
immediately.
2023-02-06 18:42:00 +01:00
Levon Ter-Grigoryan
5b82722ec1 Merge pull request #88 from PatriosTheGreat/main
[cdc_rsync] Add build id to github workflow
2023-02-06 16:48:00 +01:00
Levon Ter-Grigoryan
84427155c7 [cdc_rsync] Add build id to github workflow 2023-02-06 16:31:50 +01:00
Lutz Justen
aab0b7ef33 [PortManager] Prefer ss over netstat on Linux (#91)
ss is a modern alternative to netstat. The flags we use and the way we
parse the output are compatible with netstat. Since netstat is no
longer installed on some Linux distributions, prefer ss, but fall back
to netstat if "which ss" fails.

Also tweaks some logging.

Fixes #65
2023-02-03 11:33:07 +01:00
Lutz Justen
185c2ee19b Add Natvis for absl::flat_hash_map (#92) 2023-02-03 10:42:11 +01:00
Lutz Justen
ee4118c6bf [cdc_rsync] Detect remote architecture (#86)
Improves ServerArch so that it can detect the remote architecture by
running uname and checking %PROCESSOR_ARCHITECTURE%. So far, only
x64 Linux and x64 Windows are supported, but in the future it is easy
to add support for others, e.g. aarch64, as well.

Before the detection is run, the remote architecture is guessed first
based on the destination. For instance, if the destination directory
starts with "C:\", it pretty much means Windows. If cdc_rsync_server
exists and runs fine, there's no need for detection.

Since also PortManager depends on the remote architecture, it has to
be adjusted as well. So far, PortManager assumeed that "local" means
Windows and "remote" means Linux. This is no longer the case for
syncing to Windows devices, so this CL adds the necessary abstractions
to PortManager.

Also refactors ArchType into a separate class in common, since it is
used now from several places. It is also expanded to handle future
changes that add support for different processor architectures, e.g.
aarch64.
2023-02-01 11:51:20 +01:00
Lutz Justen
3194678007 [Release] Include docs and cdc_rsync_server.exe in zip (#70)
Also clarifies some unclear aspects in the readme, and adds a fix that
allows create_release.yml to be used for pull requests for testing.

Fixes #67
Fixes #55
2023-02-01 09:31:48 +01:00
Levon Ter-Grigoryan
bd43608799 Merge pull request #80 from PatriosTheGreat/main
[cdc_rsync] [cdc_rsync_server] Add build ID
2023-02-01 09:02:53 +01:00
Lutz Justen
5a909bb443 [cdc_rsync] Improve throughput for local copies (#74)
On Windows, fclose() seems to be very expensive for large files, where
closing a 1 GB file takes up to 5 seconds. This CL calls fclose() in
background threads. This tremendously improves local syncs, e.g.
copying a 4.5 GB, 300 files data set takes only 7 seconds instead of
30 seconds.

Also increases the buffer size for copying from 16K to 128K (better
throughput for local copies), and adds a timestamp to debug and
verbose console logs (useful when comparing client and server logs).
2023-01-31 16:33:03 +01:00
Levon Ter-Grigoryan
36f4dc9251 [cdc_rsync] [cdc_rsync_server] Add build ID
Build id is an optional unique identifier specified during cdc_rsync build via CDC_BUILD_VERSION definition.
If build id specified on both client and server components it will be used to check the version of server component instead of file size + modified time.
2023-01-31 16:28:58 +01:00
Lutz Justen
1200b34316 [common] Add ansi_filter (#73)
Adds a function to filter ANSI escape sequences from a string.
Executing SSH commands on Windows yields output that is full of ANSI
escape sequences if the "-tt" (forced TTY) argument is used. One
particular escape sequence sets the window title to
"c:\windows\system32\cmd.exe". This string is null terminated and
messes with parsing the actual output later in that string.
The filter function removes those escape sequences.

The outout is still a bit messed up, even after removing escape
sequences. Some sequences delete rows and move the cursor. Without
properly interpreting these sequences it doesn't seem possible to
retrieve the proper output.

In a future CL the -tt argument is removed on Windows, which removes
the necessity to filter ANSI codes. However, sometimes the target
architecture is not known (yet), so that it is still useful to filter
ANSI codes in that case to print useful debug output.
2023-01-31 14:53:43 +01:00
pcc
1ebe48e6de Fix build on arm64 Linux. (#83) 2023-01-31 09:07:50 +01:00
Lutz Justen
d175f947c0 Fix minor issues with VS projects (#81) 2023-01-30 11:42:05 +01:00
Lutz Justen
f8c10ce7bd [cdc_rsync] Enable local syncing (#75)
Adds support for local syncs of files and folders on the same Windows
machine, e.g. cdc_rsync C:\source C:\dest. The two main changes are

- Skip the check whether the port is available remotely with PortManager.
- Do not deploy cdc_rsync_server.
- Run cdc_rsync_server directly, not through an SSH tunnel.

The current implementation is not optimal as it starts
cdc_rsync_server as a separate process and communicates to it via a
TCP port.
2023-01-26 09:57:19 +01:00
Donovan Baarda
9cf71cae65 Fix #76 fastcdc chunk boundary off-by-one. (#78)
* Fix #76 fastcdc chunk boundary off-by-one.

This ensures that the last byte included in the gear-hash that identified the
chunk boundary is included in the chunk. This ensures chunks are still matched
when the byte immediately after them is changed.

* Init gear hash to all 1's to prevent zero-length chunks with min_size=0.

Also change the `MaxChunkSize` test to use min_size=0 to test this works.
2023-01-23 14:39:02 +01:00
Lutz Justen
efca9855e7 [cdc_rsync] [cdc_stream] Switch from scp to sftp (#66)
Use sftp for deploying remote components instead of scp. sftp has the
advantage that it can also create directries, chmod files etc., so
that we can do everything in one call of sftp instead of mixing scp
and ssh calls.

The downside of sftp is that it can't switch to ~ resp. %userprofile%
for the remote side, and we have to assume that sftp starts in the
user's home dir. This is the default and works on my machines!

cdc_rsync and cdc_stream check the CDC_SFTP_COMMAND env var now and
accept --sftp-command flags. If they are not set, the corresponding
scp flag and env var is still used, with scp replaced by sftp. This is
most likely correct as sftp and scp usually reside in the same
directory and share largely identical parameters.
2023-01-18 17:49:52 +01:00
Lutz Justen
a8b948b323 [cdc_rsync] Add initial support for Windows (#51)
Adds a ServerArch class whose job it is to encapsulate differences
between Windows and Linux cdc_rsync_servers. It detects the type
based on a heuristic in the destination path. This is not fool proof
and will probably require further work, like falling back to the other
type if the detected one doesn't work.

Uses the ServerArch class to determine the different commands to start
the server and to deploy the server.

Note that the functionality is not well tested on Windows yet, but
copying plain files works.
2023-01-17 13:34:14 +01:00
Lutz Justen
af9038b4dd [RemoteUtil] Add support for sftp (#64)
In a future CL, we will switch from scp to sftp. This CL adds support
for calling sftp from RemoteUtil.

In order to maintain backwards compatibility where people still set
--scp-command or CDC_SCP_COMMAND instead of the sftp versions, this CL
also adds the helper method RemoteUtil::ScpToSftpCommand, which
attempts to convert an scp command to an sftp command. This is usually
possible since the args are almost the same. For instance, if the scp
command is
  C:\path\to\scp.exe -P 1234 -i <key_file> -oUserKnownHostsFile=known_hosts
then the corresponding sftp command is most likely
  C:\path\to\sftp.exe -P 1234 -i <key_file> -oUserKnownHostsFile=known_hosts
This works for instance for OpenSSH.
2023-01-17 12:05:17 +01:00
Lutz Justen
f2177969fe [common] Add a way to set the process startup directory (#63)
This will be needed later for switching to sftp, since calling lcd in
sftp is tricky to get right (e.g. may or may not require /cygwin/c on
Windows, depending on whether sftp is native or not).
2023-01-16 12:26:45 +01:00
Lutz Justen
42f5ee9b44 [cdc_rsync] Fix issue in UnzstdStream (#59)
Fixes an issue in UnzstdStream where the Read() method always tries to
read new input data if no input data is available, instead of first
trying to uncompress. Since zstd maintains internal buffers,
uncompression might succeed even without reading more input, so this
is faster. This bug can lead to pipeline stalls in cdc_rsync.
2023-01-10 13:09:14 +01:00
Timo
14b750f674 Fix typo: because to this -> because of this (#57) 2023-01-09 17:58:43 +01:00
Lutz Justen
8c6deaac90 [common] Fix FileWatcherTest once and for all (#53)
But...

ONCE AND FOR ALL!

A recent change introduced WaitForWatching(), which was supposed to
block until the file watcher is actively monitoring the directory.
However this always returned immediately since the watcher is in
kFailed state if the directory was deleted, which counts as watching
(IsStarted returns true for both kWatching and kFailed states).

This CL adds an IsWatching() helper function that returns true only for
the kWatching state, which means that the directory is actively being
watched.
2023-01-09 17:56:47 +01:00
Ayush
edd0ab023b Fix typo synching -> syncing (#58) 2023-01-09 13:17:58 +01:00
Lutz Justen
9f8a7d21e6 [cdc_rsync] Improve README (#50)
Adds more info about how cdc_rsync works and why it's faster.

Fixes #49
2022-12-21 11:23:25 +01:00
Lutz Justen
a138fb55c4 [cdc_rsync] Add support for ServerSocket on Windows (#48)
Makes ServerSocket multi-platform, mainly by working around some small
API differences. The code is largely the same, there should be no
differences on Linux.

Also moves WSAStartup() and WSACleanup() up to the Socket level as
static methods because it's used by both ClientSocket and ServerSocket,
and because it doesn't make sense to do that in the socket class as
that would prevent one from using several sockets.
2022-12-19 23:02:36 +01:00
Lutz Justen
d8c2b5906e [cdc_stream] [cdc_rsync] Add --forward-port flag (#45)
Adds a flag to set the SSH forwarding port or port range used for
'cdc_stream start-service' and 'cdc_rsync'.

If a single number is passed, e.g. --forward-port 12345, then this
port is used without checking availability of local and remote ports.
If the port is taken, this results in an error when trying to connect.
Note that this restricts the number of connections that stream can
make to one.

If a range is passed, e.g. --forward-port 45000-46000, the tools
search for available ports locally and remotely in that range. This is
more robust, but a bit slower due to the extra overhead.

Optimizes port_manager_win as it was very slow for a large port range.
It's still not optimal, but the time needed to scan 30k ports is
<< 1 seconds now.

Fixes #12
2022-12-19 10:04:36 +01:00
Lutz Justen
f8438aec66 [cdc_rsync] [cdc_stream] Remove SSH port argument (#41)
This CL removes the port arguments for both tools.

The port argument can also be specified via the ssh-command and
scp-command flags. In fact, if a port is specified by both port flags
and ssh/scp commands, they interfere with each other. For ssh, the one
specified in ssh-command wins. For scp, the one specified in
scp-command wins. To fix this, one would have to parse scp-command and
remove the port arg there. Or we could just remove the ssh-port arg.
This is what this CL does. Note that if you need a custom port, it's
very likely that you also have to define custom ssh and scp commands.
2022-12-12 10:58:33 +01:00
Lutz Justen
f0ef34db2f [cdc_stream] Add integration tests (#44)
This CL adds Python integration tests for cdc_stream. To run the
tests, you need to supply a Linux host and proper configuration for
cdc_stream to work:

set CDC_SSH_COMMAND=C:\path\to\ssh.exe <args>
set CDC_SCP_COMMAND=C:\path\to\scp.exe <args>
C:\python38\python.exe -m integration_tests.cdc_stream.all_tests --binary_path=C:\full\path\to\cdc_stream.exe --user_host=user@host

Ran the tests and made sure they worked.
2022-12-08 15:12:14 +01:00
Lutz Justen
668c2ca8df [cdc_rsync] Add integration tests (#42)
[cdc_rsync] Add integration tests

This CL adds Python integration tests for cdc_rsync. To run the tests,
you need to supply a Linux host and proper configuration for cdc_rsync
to work:

  set CDC_SSH_COMMAND=C:\path\to\ssh.exe <args>
  set CDC_SCP_COMMAND=C:\path\to\scp.exe <args>
  C:\python38\python.exe -m integration_tests.cdc_rsync.all_tests --binary_path=C:\full\path\to\cdc_rsync.exe --user_host=user@host

Ran the tests and made sure they worked.
2022-12-08 08:39:43 +01:00
Lutz Justen
d2b594a41d Fix build caching (#43)
There were two problems:
- Writing the date on Windows used the wrong syntax. In Powershell,
  env variables are addressed as $env:NAME, not $NAME.
- Use different caches for opt vs fastbuild. We are currently using
  opt caches for fastbuilds, which results in lots of cache misses.
2022-12-08 08:38:23 +01:00
Lutz Justen
c21503d21b [cdc_stream] Fix issues found in tests (#40)
* [cdc_stream] Fix issues found in tests

Fixes a couple of issues found by integration testing:
- Unicode command line args in cdc_stream show up as question marks.
- Log is still named assets_stream_manager instead of cdc_stream.
- An error message contains stadia_assets_stream_manager_v3.exe.
- mount_dir was not the last arg as required by FUSE
- Promoted cache cleanup logs to INFO level since they're important
  for the proper workings of the system.
- Asset streaming cache dir is still %APPDATA%\GGP\asset_streaming.

* Address comments
2022-12-07 11:25:43 +01:00
Lutz Justen
c9e18b9e91 Reuse bazel cache folder between builds (#38)
Uses a bazel --disk_cache to cache build outputs between builds. Bazel
also has a local cache, e.g. in ~/.cache/bazel/_bazel_$USER/cache, but
that one can't be used as it won't reuse data across checkouts. A disk
cache is like a remote cache, except that it's on the local disk.

Github first looks for a cache with the given exact key in the current
branch, then in the main branch. If there's a cache hit, the cache
isn't updated (they're read-only!). To prevent that caches become
stale, they are timestamped using the current year and month, so that
the cache is force-renewed every month. Bazel disk caches also just
grow, so this technique prevents that the cache grows indefinitely,
eventually causing cache trashing.
2022-12-05 10:46:56 +01:00
Lutz Justen
6c48f939fc Do not quote ssh/scp commands (#35)
This prevents adding args to the commands, e.g.
set CDC_SCP_COMMAND=C:\path\to\scp.exe -i id_rsa.
2022-12-05 10:46:18 +01:00
Lutz Justen
1b8ad0e097 [cdc_stream] Add wildcard support to stop command (#30)
Adds support for stuff like cdc_stream stop * or cdc_stream stop user*:dir*.
2022-12-05 10:09:37 +01:00
Lutz Justen
90717ce670 [cdc_stream] Implement stop-service command (#29)
Implements cdc_stream stop-service. Also fixes an issue in the
BackgroundService implementation where Exit() would deadlock since
server shutdown waits for all RPCs to exit.
2022-12-02 19:39:13 +01:00
Lutz Justen
1120dcbee0 [cdc_stream] Automatically start service (#28)
Starts the streaming service if it's not up and running. This required
adding the ability to run a detached process. By default, all child
processes are killed when the parent process exits. Since detached
child processes don't run with a console, they need to create sub-
processes with CREATE_NO_WINDOW since otherwise a new console pops up,
e.g. for every ssh command.

Polls for 20 seconds while the service starts up. For this purpose,
a BackgroundServiceClient is added. This will be reused in a future CL
by a new stop-service command to exit the service.

Also adds --service-port as additional argument to start-service.
2022-12-02 14:34:36 +01:00
Lutz Justen
6d63aa72d7 [common] Fix FileWatcherTest (#37)
There is a race condition in RecreateWatchedDir where there was a
brief period between the second dir change event and when the file
watcher was actually watching again. If the file was written during
that bried period, it would be missed. The issue could be reproduced
easily by adding a sleep here:

  // The watched directory exists and its handle is valid.
  if (!first_run) {
    ++dir_recreate_count_;
    if (dir_recreated_cb_) dir_recreated_cb_();
    Util::Sleep(1);
  }

This CL waits until the watcher is watching again.
2022-12-02 13:03:20 +01:00
Lutz Justen
01a60e2490 Rename asset_stream_manager to cdc_stream (#27)
Fixes #13
2022-12-01 16:14:56 +01:00
Lutz Justen
f0539226a2 Merge cdc_stream into asset_stream_manager (#26)
Adds start and stop commands to asset_stream_manager.
asset_stream_manager will be renamed to cdc_stream next.
2022-12-01 12:40:00 +01:00
Lutz Justen
876e59409f Add linter workflow (#33) 2022-12-01 10:38:14 +01:00
Lutz Justen
a381541d1b [cdc_stream] Switch asset_stream_manager to use Lyra (#25)
Switch asset_stream_manager to use Lyra

Lyra has a nice simple interface, but a few quirks that we work
around, mainly in the BaseCommand class:
- It does not support return values from running a command.
- It does not support return values from a custom arg parser.
- Lyra interprets --bad_arg as positional argument.

Fixes #15
2022-12-01 10:36:48 +01:00
Lutz Justen
7d7fcc67b9 Optimize gifs (#32)
Reduces the size of the GIFs from 20ish MB to 3 MB with only minimal quality reduction. Also removes some unwanted frames.
2022-11-30 09:48:29 +01:00
Lutz Justen
491be234c6 "Proper" fix for "Input redirection is not supported" issue with timeout (#31)
The issue was consistently reproducible by adding a sleep right after starting the process.
Use ping instead of timeout now, because ping doesn't read user input.
2022-11-30 09:27:57 +01:00
Lutz Justen
b0f5403854 Fix possible indefinite loop in FileWatcherTest (#23)
The test
  FileWatcherTest/FileWatcherParameterizedTest.RecreateWatchedDir/ReadDirectoryChangesExW
is flaky. This CL doesn't fix the root cause, but it fixes the
indefinite spin in GetChangedFiles when there is no file change.
2022-11-28 11:04:02 +01:00
Lutz Justen
a97d16c4e9 Fix include paths in vcxproj files (#24)
This fixes some issues with Intellisense.
2022-11-28 11:03:35 +01:00
Lutz Justen
8c4a0465e9 Expand path variables for sync destination (#18)
Expand path variables for sync destination

Running commands like cdc_rsync C:\assets\* host:~/assets -vr would create a directory called ~assets. This CL expands path variables properly.
2022-11-25 14:21:21 +01:00