User menu

Menu

Main menu

Condo Cluster Program

Overview

Sol is a heterogeneous computing cluster, built by Dell, that can be expanded further by investments from Lehigh Faculty, Departments, Centers and Colleges. The majority of Sol compute nodes are purchased by Lehigh Faculty known as Condo Investors. In addition, LTS has purchased 8 nodes that are available to Condo Investors and other researchers (called Hotel Investors) on a rental basis. All Condo and Hotel nodes have a minimum configuration of dual socket, 10 core per socket, 128 GB RAM, 1TB hard disk and 100Gb/s EDR Infiniband interconnect. LTS also provides a head/login node for interactive access to the cluster for compiling applications, editing files, submitting job scripts and monitoring submitted jobs.

The model for sustaining high performance computing at Lehigh is premised on faculty purchasing compute nodes from their grants which are then added to the cluster. In exchange, LTS will provide system administration and support for the nodes for a period of 5 years (length of hardware warranty, see exception below). The advantage to Condo Investors, besides avoiding the cost incurred for managing the cluster, is the ability to utilize the entire collection of Hotel and Condo nodes when need arises. In exchange, Condo Investors will allow their idle cycles to be used by Condo and Hotel Investors. This provides Condo Investors with much greater flexibility than owning a standalone cluster.

If equipment is purchased with a different length of warranty either due to budget constraints or grant requirements, then LTS will commit to support and maintain the equipment only for the duration of the hardware warranty. For example, if length of warranty is 3 years, then the support provided, as described in this document, is for 3 years only.

 

 

Program Details

Compute nodes are purchased and maintained based on a 5-year warranty. The minimum purchase is one Base Compute node with various upgrade options (processor, memory, GPUs and MICs) as described in the table below. Condo Investors will be provided with an annual allocation equivalent to the number of computing core-hours or service units (SU) their investment provides, which may be expended on all available nodes on Sol. This amounts to 175,200 SUs per year for a 20-core Compute node. All investments must include a 5-year hardware warranty that would allow LTS staff to initiate repair and replacement of equipment in the event of a hardware failure. This will ensure a high quality of service with minimum disruption to users. 

At the end of 5-years, the investor may donate his/her equipment to LTS or take possession of his/her equipment to setup a cluster in their own lab. LTS will not provide infrastructure, system administration or support for out-of-warranty equipment in or out of the Data Center. Equipment donated to LTS will be used at the discretion of the Managers of Research Computing, Data Center Operations and Systems & Network Administration. They may make a decision to either dispose off the equipment or repurpose them for infrastructure resources that may or may not be available to the Lehigh community.

 

How to become a Condo Investor?

Faculty, Departments, Centers or Colleges who are are interested in investing in the Condo Program should refer to the Pricing Chart for available equipment and costs. Prospective Condo Investors should contact Research Computing Staff to procure the desired equipment. The Research Computing Staff will obtain current pricing and quotes that will be used for requisition.

Hardware Requirements

Condo Investors are expected to purchase the compute nodes, Infiniband adaptors and cables and hardware warranty for each node. LTS will provide the underlying infrastructure; rack, power, cooling, and network switches. All nodes added to the clusters are standardized as 2U nodes with 18 nodes per rack and one Infiniband leaf switch per rack.

BASE COMPUTE NODE
Processor Dual-socket, 18-core, 2.1GHz Intel Skylake Gold 6140 processors (36 cores/node)
Memory 192GB (6 X 32GB) 2666Mhz DDR4 RDIMMs
Interconnect 100Gb/s Mellanox ConnectX4 EDR Infiniband interconnect
Hard Drive 1TB 7.2K RPM SATA HDD (500GB local /scratch, swap and log files)

 

UPGRADE/Downgrade & ADD ONS
Processor Dual-socket, 16-core, 2.1GHz Intel Skylake Gold 6130 processors (32 cores/node)
Dual-socket, 20-core, 2.4GHz Intel Skylake Gold 6148 processors (40 cores/node)

Please contact Research Computing staff for upgrade options.

Skylake CPUs have 6 memory channels and all channels (i.e. DIMM need to be purchased in multiples of 6) need to be filled for optimum memory performance.

  • nVIDIA GPUs,consumer grade GTX cards or compute intensive Tesla K80 or P100, available on request

How are allocations calculated?

Annual allocations that a Condo Investor receives is calculated as

Annual SU = Total Number of Invested Cores * 24 hours/day * 365 days/year

Cores that cannot be scheduled i.e. CUDA Cores or x86 cores on a MIC Coprocessors, are not counted in the Annual SU allocation. A Base Compute Node and a Base Compute Node with an add-on Accelerator provide 175K SUs annually.

 

Condo Contributors

 

Last Updated: December 7, 2017