Scheduling jobs of a multi-node computer system based on environmental impact

Patent Number: 9015726, issued on 2015/04/21
Applied on 2009/04/03, 12/418,044
Inventor(s): Amanda Randles, David Darrington, John Santosuosso, Eric Barsness
Assignee: International Business Machines Corporation

Abstract: Embodiments of the invention provide techniques for scheduling jobs on a multi-node computing system based on the predicted environmental impact of executing the jobs. In one embodiment, a plurality of job plans may be generated for processing a requested job on the multi-node computing system. The environmental impacts resulting from executing each job plan may be estimated by matching the job plans to stored data based on standardized executions of job plans. Further, environmental impacts may be estimated by matching the job plans to stored data based on actual environmental measurements obtained during prior executions of the job plan on the multi-node computer system. The job may be executed using a job plan selected based on predicted environmental impacts and time performance.

Claims: 1. A computer-implemented method, comprising: receiving a request to execute a distributed computing task on a plurality of independent computing nodes of a multi-node computer system, each node having at least a processor and a memory; generating a plurality of job plans for processing the requested distributed computing task on the multi-node computer system, wherein each job plan specifies a set of one or more of the computing nodes to use to execute the distributed computing task; estimating a predicted environmental impact for executing each job plan based on an environmental index providing data describing environmental impacts associated with executing each of the job plans, wherein the predicted environmental impact estimates a predicted amount of thermal energy generated from executing each of the job plans; selecting, based at least in part on the estimated environmental impact predictions, one of the plurality of job plans; executing the distributed computing task according to the selected job plan; during the execution of the distributed computing task, monitoring the independent computing nodes to determine an actual environmental impact from executing the selected job plan; and updating the environmental index with the actual environmental impact associated with executing the selected job plan. 2. The computer-implemented method of claim 1, wherein the predicted environmental impact for each job plan is based on standardized executions of job plans on the specified sets of one or more computing nodes of the multi-node computer system. 3. The computer-implemented method of claim 1, wherein the predicted environmental impact for each job plan is based on measurements obtained during prior executions of the job plan on the specified sets of one or more computing nodes of the multi-node computer system. 4. The computer-implemented method of claim 1, wherein the predicted environmental impacts further comprises (i) a predicted amount of electrical power consumed from executing the job plan and (ii) a predicted amount of air conditioning load generated while executing the job plan. 5. The computer-implemented method of claim 1, wherein estimating the predicted environmental impact for each job plan comprises, for each of the plurality of job plans, matching an identifier associated with a respective job plan to stored data describing estimated environmental impacts of one or more predefined job plans. 6. The computer-implemented method of claim 5, wherein estimating the predicted environmental impact for each job plan further comprises, for each of the plurality of job plans, matching one or more system settings associated with the job plan to stored data describing estimated environmental impacts of one or more predefined job plans. 7. The computer-implemented method of claim 5, wherein estimating the predicted environmental impact for each job plan further comprises, matching an identifier of an input data set associated with the job plan to stored data describing estimated environmental impacts of one or more predefined job plans. 8. The computer-implemented method of claim 1, wherein selecting one of the plurality of job plans is further based on time performance associated with executing each job plan. 9. The computer-implemented method of claim 1, wherein selecting one of the plurality of job plans is further based on a user-specified preference for a relative priority between environmental impact and time performance. 10. The computer-implemented method of claim 1, further comprising: monitoring the distributed computing plan executed according on the computing nodes specified by the selected job plan for an actual environmental impact experienced while executing the job; and recording an indication of the actual environmental impact. 11. A non-transitory computer-readable storage medium containing a program which, when executed, performs an operation, comprising: receiving a request to execute a distributed computing task on a plurality of independent computing nodes of a multi-node computer system, each node having at least a processor and a memory; generating a plurality of job plans for processing the requested distributed computing task on the multi-node computer system, wherein each job plan specifies a set of one or more of the computing nodes to use to execute the distributed computing task; estimating a predicted environmental impact for executing each job plan based on an environmental index providing data describing environmental impacts associated with executing each of the job plans, wherein the predicted environmental impact estimates a predicted amount of thermal energy generated from executing each of the job plans; selecting, based at least in part on the estimated environmental impact predictions, one of the plurality of job plans; executing the distributed computing task according to the selected job plan; during the execution of the distributed computing task, monitoring the independent computing nodes to determine an actual environmental impact from executing the selected job plan; and updating the environmental index with the actual environmental impact associated with executing the selected job plan. 12. The non-transitory computer-readable storage medium of claim 11, wherein the predicted environmental impact for each job plan is based on standardized executions of job plans on the specified sets of one or more computing nodes of the multi-node computer system. 13. The non-transitory computer-readable storage medium of claim 11, wherein the predicted environmental impact for each job plan is based on measurements obtained during prior executions of the job plan on the specified sets of one or more computing nodes of the multi-node computer system. 14. The non-transitory computer-readable storage medium of claim 11, wherein the predicted environmental impacts further comprises: (i) a predicted amount of electrical power consumed from executing the job plan and (ii) a predicted amount of air conditioning load generated while executing the job plan. 15. The non-transitory computer-readable storage medium of claim 11, wherein estimating the predicted environmental impact for each job plan comprises, for each of the plurality of job plans, matching an identifier associated with a respective job plan to stored data describing estimated environmental impacts of one or more predefined job plans. 16. The non-transitory computer-readable storage medium of claim 15, wherein estimating the predicted environmental impact for each job plan further comprises, for each of the plurality of job plans, matching one or more system settings associated with the job plan to stored data describing estimated environmental impacts of one or more predefined job plans. 17. The non-transitory computer-readable storage medium of claim 15, wherein estimating the predicted environmental impact for each job plan further comprises, matching an identifier of an input data set associated with the job plan to stored data describing estimated environmental impacts of one or more predefined job plans. 18. The non-transitory computer-readable storage medium of claim 11, wherein selecting one of the plurality of job plans is further based on time performance associated with executing each job plan. 19. The non-transitory computer-readable storage medium of claim 11, wherein selecting one of the plurality of job plans is further based on a user-specified preference for a relative priority between environmental impact and time performance. 20. The non-transitory computer-readable storage medium of claim 11, wherein the operation further comprises: monitoring the distributed computing plan executed according on the computing nodes specified by the selected job plan for an actual environmental impact experienced while executing the job; and recording an indication of the actual environmental impact. 21. A multi-node computing system, comprising: a plurality of computing nodes, each having a processor and a memory, wherein the plurality of computing nodes are available to perform a distributed computing task, a master node having a processor and a memory, wherein the master node executes a job-scheduling application, wherein the job-scheduling application is configured to perform an operation, the operation comprising: receiving a request to process the distributed computing task on one or more of the plurality of computing nodes, generating a plurality of job plans for processing the requested distributed computing task on the one or more nodes of the plurality of computing nodes, estimating a predicted environmental impact for executing each job plan based on an environmental index providing data describing environmental impacts associated with executing each of the job plans, wherein the predicted environmental impact estimates a predicted amount of thermal energy generated from executing each of the job plans, selecting, based at least in part on the estimated environmental impact predictions, one of the plurality of job plans, executing the distributed computing task according to the selected job plan, during the execution of the distributed computing task, monitoring the independent computing nodes to determine an actual environmental impact from executing the selected job plan, and updating the environmental index with the actual environmental impact associated with executing the selected job plan. 22. The system of claim 21, wherein the predicted environmental impact for each job plan is based on standardized executions of job plans the specified sets of one or more computing nodes of multi-node computer system. 23. The system of claim 21, wherein the predicted environmental impact for each job plan is based on measurements obtained during prior executions of the job plan the specified sets of one or more computing nodes of multi-node computer system. 24. The system of claim 21, wherein the predicted environmental impacts further comprises: (i) a predicted amount of electrical power consumed from executing the job plan and (ii) a predicted amount of air conditioning load generated while executing the job plan. 25. The system of claim 21, wherein estimating the predicted environmental impact for each job plan comprises, for each of the plurality of job plans, matching an identifier associated with a respective job plan to stored data describing estimated environmental impacts of one or more predefined job plans. 26. The system of claim 21, wherein the operation performed by the master node further comprises: monitoring the distributed computing plan executed according on the computing nodes specified by the selected job plan for an actual environmental impact experienced while executing the job; and recording an indication of the actual environmental impact.