Fixes severe degradation in Linux performance observed by Torvalds


Before Linus Torvalds lost his internet and power due to a snowstorm thus impacting the Linux 6.8 integration window, his weekend was already in rough shape due to facing a performance dip with new Linux 6.8 code that was causing his Linux kernel to build. It is double what it was with previous cores. An AMD Linux engineer was able to reproduce the regression and with the lead developers, there is now a confirmed solution to this problem in the latest scheduling code.

In the discussion of the significant performance regression reported by Linus Torvalds that arose from scheduler changes in Linux 6.8, for split commit, it was not immediately clear to the developer involved what caused the regression. In the ensuing discussion, AMD’s Wise Carney mentioned that he could also reproduce the regression. Instead of a high-end AMD Ryzen Threadripper like the one Torvalds uses, Wyes was using a modest AMD Ryzen 5600G desktop. One important note he brought up is that this is only reproduced if you disable ACPI CPPC from the BIOS and use ACPI CPUFreq with the Schedutil governor.

Most AMD Zen 2 and later systems support ACPI CPPC, so with modern cores on the Ryzen side they typically use the new AMD P-State driver. But for Zen 2 / Zen 3 and earlier systems (or those that disable CPPC from BIOS), the CPUFreq driver is still used and the default CPU frequency governor is usually “Schedutil” to take advantage of scheduler usage data.

Through this thread on the mailing list, a correction was suggested and specific issues with this regression were discussed. Eventually, Vincent Guiteau believed he had a solution to the regression and Wise was able to successfully test the patch.

Guittot has now submitted table/fair: fix frequency selection for non-stationary case as a patch to fix this bad regression on new Linux 6.8 code when using ACPI CPUFreq + Schedutil. Explains with correction:

“When frequency persistence is not enabled, get_capacity_ref_freq(policy) returns the current frequency and the performance margin applied by map_util_perf(), enabling usage to exceed the maximum computing capacity and select a frequency higher than the current frequency.

Performance margin is now applied early in the pipeline to account for some usage constraints and we cannot get usage above the maximum compute capacity.

We must use a frequency higher than the current frequency to have a chance of setting a higher OPP when the current frequency is fully used. Apply the same margin and return a frequency 25% higher than the current frequency in order to switch to the next OPP before we use up the entire CPU in the current processor.”

Ultimately, it was a one-line code fix to address this performance degradation that caused Linus Torvalds’ empty kernel builds to increase from 22 seconds to 44 seconds.

Assuming everything continues to test well with the new patch, the fix should make its way into Linux 6.8 Git code once Linus Torvalds’ internet and electricity are restored.

Leave a Reply

Your email address will not be published. Required fields are marked *