Call for users

Call for Proposals: Availability of 1000 Nodes for Systems Research Experiments

NSF's PRObE (www.nmc-probe.org) operates four clusters to support systems research at scale.  The largest is Kodiak (https://www.nmc-probe.org/wiki/Machines:Kodiak), which is 1000 nodes (two core x86, 8GB DRAM, two 1TB disks, 1GE and 8Gbps IB) donated by Los Alamos National Laboratory.

Today Kodiak is hosting researchers from Georgia Tech, Carnegie Mellon and Los Alamos.  Princeton researchers have published results from Kodiak at the most recent NSDI (Wyatt Lloyd, "Stronger Semantics for Low-Latency Geo-Replicated Storage", NSDI 2013).  On PRObE staging clusters are researchers from U Central Florida, UT Austin, Georgia Tech and Carnegie Mellon.

PRObE resources are intended for (infrastructure) systems researchers committed to public release of their research results, typically publishing in distributed systems (eg. OSDI or SOSP), cloud computing (e.g. SOCC), supercomputing (e.g. SC or HPDC), storage (e.g. FAST), or networking (e.g. NSDI).

PRObE resources are managed by Emulab (www.emulab.org) a cluster manager for allocating physical nodes that has been in use for systems research for over a decade (Brian White, "An Experimental Environment for Distributed Systems and Networks," OSDI 2002).  Users start by porting and demonstrating their code on a 100-node staging cluster such as Denali built from the same equipment donation from Los Alamos.  With demonstrated success on a staging cluster, and a compelling research goal, Kodiak can be requested and allocated, possibly exclusively, for hours to days.

To start using PRObE resources:
- visit www.nmc-probe.org to learn about the resources
- visit portal.nmc-probe.org to request a PRObE-specific Emulab account
- have a research leader or faculty member get an account and define a project on portal.nmc-probe.org
- use Portal to get onto Denali, to allocate a single node experiment, login into that node to customize and resave the OS image for your project, then launch a multi-node experiment to demonstrate your system at <100 scale
- use https://www.nmc-probe.org/request/ to request a large allocation on Kodiak (this is a HotCRP paper review web site, where your paper is a short justification for your research, your preparedness for using Kodiak, and your credientials and appropriateness for using NSF resources)
- PRObE managers will review, approve and schedule your use of large allocations of Kodiak time

In a matter of weeks another style of large PRObE resource will come online.  Susitna is 34 nodes of 64 core x86 processors, for a total of more than 2000 x86 cores.  Susitna also has NVidia donated K20 GPU coprocessors with 2496 cuda cores each, for a total of 84,864 cuda cores.  With 128 GB DRAM, a hard disk and an SSD each, Susitna nodes are interconnected by 40Gbps ethernet, 40 Gbps infiniband and 1Gbps ethernet.

NSF PRObE resources will be available for at least the next two years.

All uses of PRObE resources are obligated to publish their results, either in conferences or one their web sites, and acknowledge NSF PRObE resources used in these publications.

See also our PRObE introduction article in the June 2013 USENIX ;login: vol 38, no 3, 2013 (www.cs.cmu.edu/~garth/papers/07_gibson_036-039_final.pdf).

 
 
 

Events

Upcoming and past events for PRObE.

Community Input

PRObE receives community input through committees that provide high level guidance for the project.

PRObE is targeted at the needs of systems researchers in at least three communities: high-end or high performance computing, often publishing in the Supercomputing conference (SC), data-intensive scalable computing, recently publishing in the Operating System Design and Implementation conference (OSDI), and data and storage systems for both, mostly publishing in the File and Storage Technologies conference (FAST). Advisors and decision makers as well as users for PRObE will be drawn from these communities, and annual open meetings will be held in each community at an appropriate academic conference, such as SC, OSDI/SOSP and FAST.

PRObE is governed by four bodies:

  1. The PRObE Management Group, for day-to-day operations, made up of Probe PIs and full time senior staff;
  2. The PRObE Steering Committee, for strategic decisions and high-level planning, typically meeting twice a year, and made up of six systems researchers plus the management committee, mostly drawn from universities with a representative also from government and from industry;
  3. The PRObE Project Selection Committee, for allocating machine resources to proposals, teleconferencing monthly or as needed, made up of six systems researchers plus the management committee;
  4. The PRObE User Environment Committtee, for shaping the feature requests and service provisioning of the facility, interacting electronically, made up of six systems researcher plus the management committee.

The latter three committees will be staffed using established merit based selection processes such as used by academic special interest groups and top conference program selection committees. Probe governance will be revised as needed through consultation with the stakeholder communities. In the start-up phases the Steering Committee and the Project Selection Committee were merged into a single committee, and will be split when the need arises.

In addition, regular BoFs at these major conferences will provide a forum for broad community input to PRObE. A list of current and past events can be found on the events page.

Slide Show - Inauguration

What is PRObE

Parallel Reconfigurable Observational Environment


PRObE is an NSF-sponsored project aimed at providing a large-scale, low-level systems research facility. It is a collaborative effort by the New Mexico Consortium, Los Alamos National Laboratory, Carnegie Mellon University, the University of Utah, and the University of New Mexico. It is housed at NMC in the Los Alamos Research Park.

PRObE will provide a highly reconfigurable, remotely accessible and controllable environment that researchers can use to perform experiments that are not possible at a smaller scale. PRObE at full production scale provides at least two 1024 node clusters, one of 200 nodes, and some smaller machines with extreme core count and bleeding edge technology. The machines are retired large clusters donated by DOE facilities.

The PRObE research environment will be based on the successful Emulab testbed-management software, developed by the Flux Research Group at the University of Utah's School of Computing. The Emulab software is a full-featured suite for testbed management. It is designed to provide the low-level access to testbeds that systems researchers require, as well as higher-level tools that enhance researcher productivity. It has been developed over the past decade by the Flux Research Group, part of the School of Computing at the University of Utah. Emulab is widely used in the systems research community: it powers over three dozen testbeds around the world, which are used by thousands of researchers and educators.

The bulk of the PRObE facilities are located at the New Mexico Consortium, with a smaller facility located at Carnegie Mellon University. Researchers will be able to access the facility remotely or visit Los Alamos to work at the facility.

PRObE is dedicated to systems research. The computer facility allows hands-on operation of very large computing resources. Researchers will have complete control of the hardware while they are running experiments. Researchers can inject both hardware and software failures while monitoring the system to see how it reacts to such failures. We envision this unique system will support research in many systems related fields such as Operating Systems, Storage, and High End Computing.

No other system at this scale in the world provides this ability.

 

NSFCMUEMULABUSENIX