So recently I found myself in another SSH security debate on Twitter. This is a subject long debated in the security community and one where I feel like we go in circles at times. Here are my thoughts about running SSH services at scale and where things sometimes go amiss.
On the standard vs. non-standard port debate
First, while I agree that running SSH on a non-standard port is an example of "security by obscurity," this has some tactical advantages. Before you read further, I suggest you try this experiment:
1. Spin up two Linux instances, in a public cloud running SSH; one using keys, and one using passwords.
1a. Start two SSH listeners, one using the standard port 22 and one using a random port not included in any default portscan list (e.g. a default nmap scan.)
1b. Make the SSH ports generally accessible.
2. Instrument each instance with an intrusion detection tool capable of detecting brute force attempts. One (free) option is the OSSEC LIDS (log based intrusion detection) agent. Additional intrusion detection tools are available in the marketplaces.
3. Count the size of the message data structures, and number of associated alerts, each instance generates per day due to brute force activity on port 22 vs. the pseudorandom port.
4. Extrapolate this to a medium or large sized fleet of, say, ten or twenty thousand instances.
In the public clouds, brute force alerts, like many alerts produced by many forms of automated reconnaissance and attempted intrusion campaigns, are supernumerary. In a medium sized fleet, you can easily generate more than one hundred thousand such alerts per day. Assuming you collect and retain security alerts and associated log data for, say, three to six months, we are now burning money (in the form of compute and storage) processing brute force alerts. We're also creating alert fatigue and distracting security analyst / hunters who could be working on something more productive like hunting structured threats. I would argue that running SSH on a non-standard port, in order to manage alert fatigue, is a useful tactic.
Where resources permit, I would actually suggest running three SSH services:
1) A functional service on a nonstandard port which accepts root logins;
2) A functional service on a nonstandard port which accepts non-root logins, if use cases for this exist;
3) A decoy, non-functional service on port 22 that accepts no logins and has no shell.
With this combination, you can adjust your analytics to lower the priority of campaigns against port 22 by unstructured threats that may never realize they're not interrogating a working service. At the same time, alerts involving campaigns against the working services can be raised in priority as these tend to suggest a more determined human attacker who has taken the trouble to find the real SSH services. These are more interesting things to hunt, and time better spent.
On the Bastion Host Design Pattern
Few of us would argue that placing a Bastion in front of a production SSH service is not a useful tactic. Placing a bastion inline, like a firewall, feels safer and this makes everyone feel good. Where this goes wrong, too often, is in at least a couple of ways I can cite:
1. Assumptions may be made that the presence of the bastion has created sort of impenetrable condition that allows security to be relaxed in the environment behind the bastion. A bastion host is not a perfect defense, more than any other technology, and security needs to be applied systemically using the assumption that any single point, including a bastion, may fail at some time.
2. Ineffective or incoherent identity management and logging on SSH sessions or tunnels crossing the bastion. Figuring out exactly who did what - disambiguating user identity and / or context - too often proves to be infeasable in reality due to inadequate logging or user identity management across bastions.
Given a choice between a bastioned environment with weak logging where I cannot establish user context, and a non-bastioned environment with strong identity management and logging, where I an establish user identity, I'd actually tend to choose the latter. Given a choice, I think using a 2FA VPN with hardware tokens, as some are increasingly doing, with very strong identity management and logging, is often preferable to a simple bastion.
On SSH network security in the public cloud
Applying network access controls to SSH is a fine idea in principle that sometimes breaks down due to account sprawl. Organizations love to create lots of accounts in order to create safeguards against different product or application teams stepping on each other's work. Keeping a large and dynamic list of accounts connected to a "bastion" SSH VPC - via peering or VPN - can be harder than it sounds. Security groups don't cross accounts and there is no obvious way to manage security groups or network ACLS across the AWS account boundary. I sometimes wish that we could use account ID as a parameter in security groups in order to allow instances from any account, in the same billing entity organization, to reach certain network services. I've actually suggested this in the past but this its not a simple feature request.