Instead of running netstat/ss on local and remote systems, just bind
with port 0 to find an ephemeral port. This is much more robust,
simpler and a bit faster. Since the remote port is only known after
running cdc_fuse_fs, port forwarding has to be set up after running
cdc_fuse_fs.
This CL changes the chunking algorithm from "normalized chunking" to
simple "regression chunking", and changes the has criteria from
'hash&mask' to 'hash<=threshold'. These are all ideas taken from
testing and analysis done at
https://github.com/dbaarda/rollsum-chunking/blob/master/RESULTS.rst
Regression chunking was introduced in
https://www.usenix.org/system/files/conference/atc12/atc12-final293.pdf
The algorithm uses an arbitrary number of regressions using power-of-2
regression target lengths. This means we can use a simple bitmask for
the regression hash criteria.
Regression chunking yields high deduplication rates even for lower max
chunk sizes, so that the cdc_stream max chunk can be reduced to 512K
from 1024K. This fixes potential latency spikes from large chunks.
Adds an ArchType argument to many RemoteUtil methods, which is used to
replace -tt (forced pseudo-TTY allocation) by -T (no pseudo-TTY
allocation). The -tt option adds tons of ANSI escape sequences to the
output and makes it unparsable, even after removing the sequences, as
some sequences like "delete the last X characters" are not honoured.
An exception is BuildProcessStartInfoForSshPortForward, where
replacing -tt by -T would make the port forwarding process exit
immediately.
Improves ServerArch so that it can detect the remote architecture by
running uname and checking %PROCESSOR_ARCHITECTURE%. So far, only
x64 Linux and x64 Windows are supported, but in the future it is easy
to add support for others, e.g. aarch64, as well.
Before the detection is run, the remote architecture is guessed first
based on the destination. For instance, if the destination directory
starts with "C:\", it pretty much means Windows. If cdc_rsync_server
exists and runs fine, there's no need for detection.
Since also PortManager depends on the remote architecture, it has to
be adjusted as well. So far, PortManager assumeed that "local" means
Windows and "remote" means Linux. This is no longer the case for
syncing to Windows devices, so this CL adds the necessary abstractions
to PortManager.
Also refactors ArchType into a separate class in common, since it is
used now from several places. It is also expanded to handle future
changes that add support for different processor architectures, e.g.
aarch64.
Use sftp for deploying remote components instead of scp. sftp has the
advantage that it can also create directries, chmod files etc., so
that we can do everything in one call of sftp instead of mixing scp
and ssh calls.
The downside of sftp is that it can't switch to ~ resp. %userprofile%
for the remote side, and we have to assume that sftp starts in the
user's home dir. This is the default and works on my machines!
cdc_rsync and cdc_stream check the CDC_SFTP_COMMAND env var now and
accept --sftp-command flags. If they are not set, the corresponding
scp flag and env var is still used, with scp replaced by sftp. This is
most likely correct as sftp and scp usually reside in the same
directory and share largely identical parameters.
Adds a flag to set the SSH forwarding port or port range used for
'cdc_stream start-service' and 'cdc_rsync'.
If a single number is passed, e.g. --forward-port 12345, then this
port is used without checking availability of local and remote ports.
If the port is taken, this results in an error when trying to connect.
Note that this restricts the number of connections that stream can
make to one.
If a range is passed, e.g. --forward-port 45000-46000, the tools
search for available ports locally and remotely in that range. This is
more robust, but a bit slower due to the extra overhead.
Optimizes port_manager_win as it was very slow for a large port range.
It's still not optimal, but the time needed to scan 30k ports is
<< 1 seconds now.
Fixes#12
This CL removes the port arguments for both tools.
The port argument can also be specified via the ssh-command and
scp-command flags. In fact, if a port is specified by both port flags
and ssh/scp commands, they interfere with each other. For ssh, the one
specified in ssh-command wins. For scp, the one specified in
scp-command wins. To fix this, one would have to parse scp-command and
remove the port arg there. Or we could just remove the ssh-port arg.
This is what this CL does. Note that if you need a custom port, it's
very likely that you also have to define custom ssh and scp commands.
* [cdc_stream] Fix issues found in tests
Fixes a couple of issues found by integration testing:
- Unicode command line args in cdc_stream show up as question marks.
- Log is still named assets_stream_manager instead of cdc_stream.
- An error message contains stadia_assets_stream_manager_v3.exe.
- mount_dir was not the last arg as required by FUSE
- Promoted cache cleanup logs to INFO level since they're important
for the proper workings of the system.
- Asset streaming cache dir is still %APPDATA%\GGP\asset_streaming.
* Address comments
Implements cdc_stream stop-service. Also fixes an issue in the
BackgroundService implementation where Exit() would deadlock since
server shutdown waits for all RPCs to exit.
Starts the streaming service if it's not up and running. This required
adding the ability to run a detached process. By default, all child
processes are killed when the parent process exits. Since detached
child processes don't run with a console, they need to create sub-
processes with CREATE_NO_WINDOW since otherwise a new console pops up,
e.g. for every ssh command.
Polls for 20 seconds while the service starts up. For this purpose,
a BackgroundServiceClient is added. This will be reused in a future CL
by a new stop-service command to exit the service.
Also adds --service-port as additional argument to start-service.
So far, errors from the remote netstat process would only be logged in
the asset stream service, for instance when SSH auth failed. However,
the errors were not shown to the client, and that's the most important
thing.
Also adds some feedback to cdc_stream in case of success.
Implements the cdc_stream client and adjusts asset streaming in
various places to work better outside of a GGP environment.
This CL tries to get quoting for SSH commands right. It also brings
back the ability to start a streaming session from
asset_stream_manager.
Also cleans up Bazel targets setup. Since the sln file is now in root,
it is no longer necessary to prepend ../ to relative filenames to
make clicking on errors work.