User menu

Menu

Main menu

Condo Cluster Program

Overview

Sol is a heterogeneous computing cluster, built by Dell, that can be expanded further by investments from Lehigh Faculty, Departments, Centers and Colleges. The majority of Sol compute nodes are purchased by Lehigh Faculty known as Condo Investors. In addition, LTS has purchased 8 nodes that are available to Condo Investors and other researchers (called Hotel Investors) on a rental basis. All Condo and Hotel nodes have a minimum configuration of dual socket, 10 core per socket, 128 GB RAM, 1TB hard disk and 100Gb/s EDR Infiniband interconnect. LTS also provides a head/login node for interactive access to the cluster for compiling applications, editing files, submitting job scripts and monitoring submitted jobs.

The model for sustaining high performance computing at Lehigh is premised on faculty purchasing compute nodes from their grants which are then added to the cluster. In exchange, LTS will provide system administration and support for the nodes for a period of 4 years (length of hardware warranty). The advantage to Condo Investors, besides avoiding the cost incurred for managing the cluster, is the ability to utilize the entire collection of Hotel and Condo nodes when need arises. In exchange, Condo Investors will allow their idle cycles to be used by Condo and Hotel Investors. This provides Condo Investors with much greater flexibility than owning a standalone cluster.

Program Details

Compute nodes are purchased and maintained based on a 4-year warranty. The minimum purchase is one Base Compute node with various upgrade options (processor, memory, GPUs and MICs) as described in the table below. Condo Investors will be provided with an annual allocation equivalent to the number of computing core-hours or service units (SU) their investment provides, which may be expended on all available nodes on Sol. This amounts to 175,200 SUs per year for the Base Compute node. All investments must include a 4-year hardware warranty that would allow LTS staff to initiate repair and replacement of equipment in the event of a hardware failure. This will ensure a high quality of service with minimum disruption to users.

At the end of 4-years, the investor may donate his/her equipment to LTS or take possession of his/her equipment to setup a cluster in their own lab. LTS will not provide infrastructure, system administration or support for out-of-warranty equipment in or out of the Data Center. Equipment donated to LTS will be used at the discretion of the Managers of Research Computing, Data Center Operations and Systems & Network Administration. They may make a decision to either dispose off the equipment or repurpose them for infrastructure resources that may or may not be available to the Lehigh community.

How to become a Condo Investor?

Faculty, Departments, Centers or Colleges who are are interested in investing in the Condo Program should refer to the Pricing Chart for available equipment and costs. Prospective Condo Investors should contact Research Computing Staff to procure the desired equipment. The Research Computing Staff will obtain current pricing and quotes that will be used for requisition.

Hardware Requirements

Condo Investors are expected to purchase the compute nodes, Infiniband adaptors and cables and a 4-year warranty for each node. LTS will provide the underlying infrastructure; rack, power, cooling, and network switches. All nodes added to the clusters are standardized as 2U nodes with 18 nodes per rack and one Infiniband leaf switch per rack.

BASE COMPUTE NODE
Processor Dual-socket, 10-core, 2.3GHz Intel Haswell Xeon E5-2650v3 processors (20 cores/node)
Memory 128GB (16 X 8GB) 2133Mhz DDR4 RDIMMs
Interconnect 100Gb/s Mellanox ConnectX4 EDR Infiniband interconnect
Hard Drive 1TB 7.2K RPM SATA HDD (500GB local /scratch, swap and log files)

 

UPGRADE & ADD ONS
Processor Dual-socket, 10-core, 2.5GHz Intel Haswell Xeon E5-2660v3 processors (20 cores/node)
Dual-socket, 12-core, 2.5GHz Intel Haswell Xeon E5-2670v3 processors (24 cores/node)
Dual-socket, 12-core, 2.5GHz Intel Haswell Xeon E5-2670v3 processors (24 cores/node)
Memory 256GB (16 X 8GB) 2133Mhz DDR4 RDIMMs
512GB 2133Mhz DDR4 RDIMMs
1TB 2133Mhz DDR4 RDIMMs
Accelerator NVIDIA Tesla K80 “Kepler” M-Class GPU Accelerator
Intel Xeon Phi 7120P Coprocessor

 

How are allocations calculated?

Annual allocations that a Condo Investor receives is calculated as

Annual SU = Total Number of Invested Cores * 24 hours/day * 365 days/year

Cores that cannot be scheduled i.e. CUDA Cores or x86 cores on a MIC Coprocessors, are not counted in the Annual SU allocation. A Base Compute Node and a Base Compute Node with an add-on Accelerator provide 175K SUs annually.

 

Condo Contributors

 

Last Updated: March 6, 2017