Machine learning models are growing exponentially. The energy required to train them has also multiplied exponentially – only after training the AI can accurately process images or text or video. As the AI community grapples with its impact on the environment, some conferences are now requiring paper submitters to provide information on carbon dioxide emissions. New research provides a more accurate way to calculate emissions . It also compares the factors that affect them and tests two ways to reduce emissions.
The researchers trained 11 machine learning models of varying sizes to process language or images. Training times ranged from 1 hour on a single GPU to 8 days on 256 GPUs. They record energy consumption data every second. Carbon emissions per kWh of energy in five-minute units during 2020 were also obtained for 16 geographic regions. They can then compare carbon emissions from running different models in different regions and at different times.
The carbon footprint of powering a GPU that trains the smallest model is roughly equivalent to charging a phone. The largest model contains 6 billion parameters, which are a measure of model size. Although its training is only 13% complete, the GPU’s carbon footprint is almost equivalent to the electricity consumed by an American household for a year. And some deployed models, such as OpenAI’s GPT-3, contain more than 100 billion parameters.
The biggest factor in reducing carbon emissions is geographic region: CO2 emissions per kilowatt-hour range from 200 to 755 grams per kWh. In addition to changing locations, the researchers also tested two ways to reduce carbon dioxide emissions, which they were able to do thanks to the high temporal granularity of the data. The first method is “flexible start,” which can delay training for up to 24 hours. For the largest models that take several days to train, a delay of one day typically reduces carbon emissions by less than 1%, but for much smaller models, such a delay can reduce emissions by 10% to 80%. The second method is “pause and resume,” which pauses training during periods of high emissions, as long as the total training time does not more than double. This approach benefits small models by only a few percentage points, but in half the regions it benefits the largest models by 10 to 30 percent. Emissions per kilowatt-hour fluctuate over time, in part because the grid must rely on “dirty electricity” when intermittent clean energy sources such as wind and solar cannot meet demand due to a lack of adequate energy storage.
This article is reprinted from: https://www.solidot.org/story?sid=71960
This site is for inclusion only, and the copyright belongs to the original author.