eGlobaltech was contracted to support the hosting of a containerized application in AWS for a large federal agency program. The task was originally presented by the customer as a straightforward, low-effort engagement that only required implementing a couple of containers, relational database, and elastic file system (EFS) for shared state for completion. During our initial assessment and analysis, we quickly determined the customer’s complex business rules and compliance requirements presented a more significant challenge than originally framed. In reality, the customer needed a top-down review of the existing application architecture to develop a responsive, secure, and compliant solution that leveraged AWS containerization, network, and data services.
The customer’s security compliance requirements presented several constraints on the design that excluded several obvious options and forced us to implement more creative workarounds. Due to the system’s FISMA Moderate rating, we had to use AWS’s GovCloud region which offered a reduced set of services and fewer availability zones (AZ). While two AZs met the minimum standards to resiliency, we preferred three to meet performance and resiliency requirements. When GovCloud West became available, we seamlessly incorporated this AZ into our designs to ensure we met our availability requirements.
The lack of Route53 within GovCloud also presented a problem, as our customer did not permit hybrid networking models that would have let us use Route53 from the US East region. Several application components, including OpenShift, rely heavily upon DNS, so we had to consider alternative DNS solutions. We initially planned to replicate the federal agency’s DNS within our VPC, but this added significant complexity as it required firewall changes within the agency’s edge network and coupled us to their change control process. We ultimately decided to configure an internal BIND server in the management VPC with anticipation of moving our record set to Route53 as soon as it becomes available.
Maintaining compliance with the Trusted Internet Connection (TIC) requirement presented another challenge during the design phase. This mandate forces all traffic to traverse the TIC for security review, which works counter to the guidance of AWS’s well-architected framework. We implemented a public/private VPC concept that ensures our VPN connections routed through the agency’s TIC before arriving at the system.
When we began the project, we had anticipated utilizing one of the native AWS container services such as Elastic Container Service (ECS). However, the customer mandated the use of OpenShift as our container platform. This ultimately increased the difficulty and complexity by maintaining our own container ecosystem, rather than utilizing one of AWS’s SaaS solutions. It also limited our elasticity and resiliency by locking us in with licensing limitations as we were forced to choose our instance types to maximize memory while staying under two vCPU’s. If Elastic Kubernetes Service (EKS) was available in GovCloud and we were permitted to use it, we would have had a lot more scalability and elasticity to the environment.
Finally, we were constrained by a limited number of IP addresses provisioned by the agency for our use. This limitation was due, in part, because they insisted that endpoints be in routable agency-owned IP space. To work around this limitation, we implemented the Spoke VPC model.
We utilized these Spoke VPCs as pseudo public VPCs, and mapped their IP address to the range given to us by the agency. Within the VPCs, we placed load-balanced and autoscaling proxy servers and bastion hosts. This allowed us to limit our IP utilization while ensuring we did not sacrifice security, elasticity, or resiliency
Our system implements several layers of security between the internet and our data. We build a very elastic system such that the entire environment can easily scale up and down, allowing us to consistently keep our costs and performance optimized. We were able to utilize stateless auto-scaled instances and AWS provided App services to automatically optimize for cost savings.
We leveraged Elastic Load Balancers (ELB) and horizontally auto-scaled stateless proxy servers that we can automatically scale them up and down according to the load on the environment. This allows us to run two very small instances when QA/dev are idle, and easily and automatically scale them up to handle production load.
We utilized ElasticSearch, ElasticCache, and Relational Data Services (RDS) for the storage and caching requirements of the application for an application that was originally intended to utilize manually managed services. This saved us the significant overhead of trying to host all those services ourselves on individual EC2 instances and significantly reduced development and management overhead. We achieved additional cost savings as all traffic to the agency is considered “regional data transfer”, and the agency can consolidate all their networking into a direct connection into their datacenter for further savings.
From day one we insisted that every piece of infrastructure be provisioned and orchestrated as code (Terraform and Ansible). This allowed us to modularize the spoke VPC, the application VPC, and all the networking. This substantially reduced management overhead of handling two accounts and enabled us to easily manage redundant infrastructure components.