A Framework for Statistical Analysis of Datasets on Heterogeneous Clusters
Carino, R.L., & Banicescu, I. (2005). A Framework for Statistical Analysis of Datasets on Heterogeneous Clusters. Proceedings of the 2005 IEEE International Conference on Cluster Computing. Burlington, MA: IEEE Computer Society Press. (On CDROM).
Abstract
This paper proposes a framework for the statistical analysis of
multiple related datasets on heterogeneous clusters. The
analysis procedure, which is separate from the framework,
may have a limited degree of concurrency
that only a small number of processors is needed to execute
the procedure. Further, the datasets may have a wide range of sizes
leading to large differences of dataset analysis times.
The framework partitions the processors assigned to it by the
cluster scheduler into processor groups, the maximum size of
a group being chosen to match the
degree of concurrency in the analysis procedure. The framework
also employs dynamic loop scheduling to address the load imbalance
factors arising from the variability of the computational loads
of the datasets, as well as the unpredictable irregularities
of the cluster environment.
Results from preliminary tests of using the framework to fit
gamma-ray burst datasets with vector functional coefficient
autoregressive time series models on 64 processors of a
heterogeneous general-purpose Linux cluster demonstrate
the effectiveness of the framework.