**Patent Number:** 8387064, issued on 2013/02/26

**Applied on** 2008/10/09, 12/248,568

**Inventor(s):** Charles Archer, Amanda Randles, Brian Smith

**Assignee:** International Business Machines Corporation

**Abstract:** Methods, apparatus, and products are disclosed for balancing a data processing load among a plurality of compute nodes in a parallel computer that include: partitioning application data for processing on the plurality of compute nodes into data chunks; receiving, by each compute node, at least one of the data chunks for processing; estimating, by each compute node, processing time involved in processing the data chunks received by that compute node for processing; and redistributing, by at least one of the compute nodes to at least one of the other compute nodes, a portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node.

**Claims:** 1. A method of balancing a data processing load among a plurality of compute nodes in a parallel computer, the method comprising: partitioning application data for processing on the plurality of compute nodes into data chunks; receiving, by each compute node, at least one of the data chunks for processing; estimating, by each compute node, processing time involved in processing the data chunks received by that compute node for processing, including measuring, by each compute node, randomness of data in the data chunks received by that compute node, wherein the randomness of data in the same dataset is determined by the compression ration when compressing the sample dataset using a lossless data compression algorithm; and redistributing, by at least one of the compute nodes to at least one of the other compute nodes, a portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node.
2. The method of claim 1 wherein: the method further comprises exchanging, by each compute node, the processing time estimated by that compute node with each of the other compute nodes; and redistributing, by at least one of the compute nodes to at least one of the other compute nodes, a portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node further comprises redistributing the portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node and the processing times estimated by the other compute nodes.
3. The method of claim 1 wherein redistributing, by at least one of the compute nodes to at least one of the other compute nodes, a portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node further comprises redistributing the portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node and a predefined distribution threshold.
4. The method of claim 1 wherein estimating, by each compute node, processing time involved in processing the data chunks received by that compute node for processing further comprises: selecting, by each compute node, a sample dataset from the data chunks received by that compute node for processing; and estimating, by each compute node, processing time involved in processing the sample dataset.
5. The method of claim 1 wherein the plurality of compute nodes are connected using a plurality of data communications networks, at least one of the data communications networks optimized for point to point operations, and at least one of the data communications networks optimized for collective operations.
6. A parallel computer for balancing a data processing load among a plurality of compute nodes in the parallel computer, the parallel computer comprising a plurality of computer processors and computer memory operatively coupled to the computer processors, the computer memory having disposed within it computer program instructions capable of: partitioning application data for processing on the plurality of compute nodes into data chunks; receiving, by each compute node, at least one of the data chunks for processing; estimating, by each compute node, processing time involved in processing the data chunks received by that compute node for processing, including measuring, by each compute node, randomness of data in the data chunks received by that compute node, wherein the randomness of data in the same dataset is determined by the compression ration when compressing the sample dataset using a lossless data compression algorithm; and redistributing, by at least one of the compute nodes to at least one of the other compute nodes, a portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node.
7. The parallel computer of claim 6 wherein: the computer memory has disposed within it computer program instructions capable of exchanging, by each compute node, the processing time estimated by that compute node with each of the other compute nodes; and redistributing, by at least one of the compute nodes to at least one of the other compute nodes, a portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node further comprises redistributing the portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node and the processing times estimated by the other compute nodes.
8. The parallel computer of claim 6 wherein redistributing, by at least one of the compute nodes to at least one of the other compute nodes, a portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node further comprises redistributing the portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node and a predefined distribution threshold.
9. The parallel computer of claim 6 wherein estimating, by each compute node, processing time involved in processing the data chunks received by that compute node for processing further comprises: selecting, by each compute node, a sample dataset from the data chunks received by that compute node for processing; and estimating, by each compute node, processing time involved in processing the sample dataset.
10. The parallel computer of claim 6 wherein the plurality of compute nodes are connected using a plurality of data communications networks, at least one of the data communications networks optimized for point to point operations, and at least one of the data communications networks optimized for collective operations.
11. A computer program product for balancing a data processing load among a plurality of compute nodes in a parallel computer, the computer program product disposed upon a computer readable non-transmission medium, the computer program product comprising computer program instructions capable of: partitioning application data for processing on the plurality of compute nodes into data chunks; receiving, by each compute node, at least one of the data chunks for processing; estimating, by each compute node, processing time involved in processing the data chunks received by that compute node for processing, including measuring, by each compute node, randomness of data in the data chunks received by that compute node, wherein the randomness of data in the same dataset is determined by the compression ration when compressing the sample dataset using a lossless data compression algorithm; and redistributing, by at least one of the compute nodes to at least one of the other compute nodes, a portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node.
12. The computer program product of claim 11 wherein: the computer program product further comprises computer program instructions capable of exchanging, by each compute node, the processing time estimated by that compute node with each of the other compute nodes; and redistributing, by at least one of the compute nodes to at least one of the other compute nodes, a portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node further comprises redistributing the portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node and the processing times estimated by the other compute nodes.
13. The computer program product of claim 11 wherein redistributing, by at least one of the compute nodes to at least one of the other compute nodes, a portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node further comprises redistributing the portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node and a predefined distribution threshold.
14. The computer program product of claim 11 wherein estimating, by each compute node, processing time involved in processing the data chunks received by that compute node for processing further comprises: selecting, by each compute node, a sample dataset from the data chunks received by that compute node for processing; and estimating, by each compute node, processing time involved in processing the sample dataset.
15. The computer program product of claim 11 wherein the plurality of compute nodes are connected using a plurality of data communications networks, at least one of the data communications networks optimized for point to point operations, and at least one of the data communications networks optimized for collective operations.