Intended for system administrator audience to assist them in building the data commons infrastructure, move data, manage permissions, and implement AWS security controls
Documentation Goals
The Data Commons-Alpha (DCA) proof of concept was created on Amazon Web Services cloud architecture. A future version of DCA may be hosted in the Microsoft Azure cloud, and this documentation will be updated accordingly. This documentation is divided into the following categories:
- Architecture: an overview of the DCA cloud architecture will be described along with recommended free training and installation guides from AWS and other resources.
- Tools: DCA provides three ways to interact with loaded data.
- The Littlest JupyterHub provides a Jupyter notebook interface for general purpose programming in the R and Python languages.
- pgweb - Cross-platform client for PostgreSQL databases
- Elastic Stack - That’s Elasticsearch, Kibana, Beats, and Logstash (also known as the ELK Stack). Reliably and securely take data from any source, in any format, then search, analyze, and visualize it in real time.
- Extract, Transform, Load (ETL): CSV file output from the SAS ETL pipeline is conformed to a normalized data model, enriched with geolocation data, and programmatically loaded into a relational and document databases.
- Security: Security best practices such as Secure Socket Layer (SSL) encryption for data in transit, least privilege user accounts, network segmentation, among other technical security controls.
- Cost Analysis: Summary and itemized prices are provided for the AWS Virtual Private Cloud (VPC) infrastructure and DCA analysis tools.