If you live in an earthquake zone, it’s important to engineer buildings to survive an earthquake. You don’t know when an earthquake will happen, or where exactly, or how big it’s going to be, but you know that it will happen at some point during the lifetime of the building. And the consequences of not earthquake-proofing can be deadly.
The same goes for your network critical infrastructure. At some point, some part of your network will go down. The consequences are not usually deadly, but it can feel that way when it’s happening to you.
Maybe there will be a major power outage in your area.
Maybe a junior sysadmin will type the wrong thing into a terminal window in a moment of inattention.
Maybe an OS upgrade will break some library dependencies.
Maybe a ransomware attack will hold your sensitive information hostage.
Maybe a file somewhere will become corrupted.
Maybe someone will accidentally unplug the wrong thing in the server room.
You don’t know what will happen, or how bad it will be. But you know it will happen. If you think that it won’t happen to you, stay in the information technology business for a few more years.
When you are designing your RADIUS infrastructure, you should consider:
How bad would it be for your business if people couldn’t get onto the secure network?
For most businesses, the answer is somewhere between “really bad” and “catastrophic.”
The good news is that with just a little bit of planning and forethought, you can design your system to be a lot more resilient to failures, including cyber threats.
Most of these network security solutions don’t cost anything other than some extra RADIUS hardware and some additional disk space. In our experience, the upfront investment in this critical infrastructure pales in comparison to what you can expect to pay in emergency rates for network specialists when disaster strikes. Not to mention the loss to your business and reputation when your network goes down.
Design strategy: Disaster-proof your RADIUS infrastructure
1) Put your RADIUS server on a virtual machine—by itself. When something goes wrong, all you have to do is revert to a previous snapshot and be up and running again in a few minutes. Using a VM for your RADIUS server is incredibly easy to do and has virtually no downside for remote access systems.
2) Give your RADIUS server enough resources to withstand unexpected surges in demand. In most organizations, the volume of authentication requests happens in a fairly predictable pattern—until something bad happens and your network goes down. When you bring your network back up, all your users will try to authenticate at the same time. If you haven’t sufficiently resourced your RADIUS server, this can bring your network down again. As a rule of thumb, we recommend limiting the RADIUS VM at no more than 5–10% CPU usage in the “normal” case. Any less than that, and there might not be enough room to deal with spikes in traffic.
3) Put your databases on separate hardware from your primary RADIUS server. Separate hardware for separate components means that a single hardware failure will be less catastrophic. Maintaining dedicated RADIUS hardware also means that authentication performance won’t be affected by resource-heavy database queries.
4) In multi-site systems, secondary RADIUS servers should be simple clones of the primary one. When RADIUS policies and configuration files are cloned across all sites, a RADIUS server failure at any given satellite location is almost trivial to recover from. A new server can simply be cloned again from the primary RADIUS server within minutes. See our design blueprint for multi-site RADIUS systems for more detail.
5) In multi-site systems, consider deploying two primary instances of the database. Losing access to the database is generally a catastrophic failure for your network. Our recommended design strategy to ensure redundancy for this critical component is to deploy two primary instances of your database. Bear in mind that some up-front engineering effort will be required to ensure that the two primary instances are kept in sync. However, the extra network resilience (and peace of mind!) gained by providing database redundancy far outweighs this additional effort.
By following these simple design best practices, you will set up your
 network critical infrastructure to recover much more quickly from unexpected 
failures, which are inevitable. None of these solutions are expensive in
 either time or money. It only requires some forethought and planning 
when initially configuring your RADIUS infrastructure.
Need help with risk management for network critical infrastructure?
InkBridge Networks has been at the forefront of network security for over two decades, tackling complex challenges across various protocols and infrastructures. Our team of seasoned experts has encountered and solved nearly every conceivable network security issue. If you're looking for insights from the architects behind some of the internet's most foundational authentication systems, you can request a quote for network security solutions here. 
Related Articles
 
                        Network design for multi-site RADIUS systems
Some organizations and network operators such as ISPs can use a central RADIUS service for all of their RADIUS needs. This configuration is possible when there are a small number of users, or system load is low.
 
                        Making RADIUS More Secure
As we’ve previously discussed, there are several insecure elements in RADIUS. We are currently working in the IETF (Internet Engineering Task Force) to close those gaps and improve security for everyone. This article outlines some of the current shortcomings of RADIUS, best practices for mitigating against them, and a roadmap for how these vulnerabilities will be addressed within the RADIUS standard.