windows C++ 并行编程-使用 combinable 提高性能

时间:2025/7/28 15:57:35来源：https://blog.csdn.net/m0_72813396/article/details/141730189 浏览次数:0次

在某些情况下，并行映射 (concurrency::parallel_transform) 和化简 (concurrency:: parallel_reduce) 可以通过 combinable 实现性能改进。

示例 - accumulate

下面的示例使用 std::accumulate 函数来计算数组中质数元素的总和。在此示例中，a 是 array 对象，并且 is_prime 函数确定其输入值是否是质数。

prime_sum = accumulate(begin(a), end(a), 0, [&](int acc, int i) {return acc + (is_prime(i) ? i : 0);
});

示例 - parallel_for_each

下面的示例演示了并行化上一个示例的朴素方法。此示例使用 concurrency::parallel_for_each 算法并行处理数组，并使用 concurrency::critical_section 对象同步对 prime_sum 变量的访问。此示例不进行缩放，因为每个线程必须等待共享资源可用。

critical_section cs;
prime_sum = 0;
parallel_for_each(begin(a), end(a), [&](int i) {cs.lock();prime_sum += (is_prime(i) ? i : 0);cs.unlock();
});

示例 - combinable

下面的示例使用 combinable 对象来提高上一个示例的性能。此示例无需同步对象；该示例将缩放，因为 combinable 对象使每个线程能够独立执行其任务。

combinable 对象通常用于两个步骤。首先，通过并行执行工作来生成一系列精细计算。接下来，将计算合并(或化简)成最终结果。此示例使用 concurrency::combinable::local 方法获取对本地求和计算的引用。然后，它使用 concurrency::combinable::combine 方法和 std::plus 对象将本地计算合并成最终结果。

combinable<int> sum;
parallel_for_each(begin(a), end(a), [&](int i) {sum.local() += (is_prime(i) ? i : 0);
});
prime_sum = sum.combine(plus<int>());

示例 - 串行和并行

下面的完整示例以串行和并行方式计算质数之和。该示例向控制台输出了进行两种计算所需的时间。

// parallel-sum-of-primes.cpp
// compile with: /EHsc
#include <windows.h>
#include <ppl.h>
#include <array>
#include <numeric>
#include <iostream>using namespace concurrency;
using namespace std;// Calls the provided work function and returns the number of milliseconds 
// that it takes to call that function.
template <class Function>
__int64 time_call(Function&& f)
{__int64 begin = GetTickCount();f();return GetTickCount() - begin;
}// Determines whether the input value is prime.
bool is_prime(int n)
{if (n < 2)return false;for (int i = 2; i < n; ++i){if ((n % i) == 0)return false;}return true;
}int wmain()
{   // Create an array object that contains 200000 integers.array<int, 200000> a;// Initialize the array such that a[i] == i.iota(begin(a), end(a), 0);int prime_sum;__int64 elapsed;// Compute the sum of the numbers in the array that are prime.elapsed = time_call([&] {prime_sum = accumulate(begin(a), end(a), 0, [&](int acc, int i) {return acc + (is_prime(i) ? i : 0);});});   wcout << prime_sum << endl;   wcout << L"serial time: " << elapsed << L" ms" << endl << endl;// Now perform the same task in parallel.elapsed = time_call([&] {combinable<int> sum;parallel_for_each(begin(a), end(a), [&](int i) {sum.local() += (is_prime(i) ? i : 0);});prime_sum = sum.combine(plus<int>());});wcout << prime_sum << endl;wcout << L"parallel time: " << elapsed << L" ms" << endl << endl;
}

四核计算机上输出如下:

1709600813
serial time: 6178 ms1709600813
parallel time: 1638 ms

示例

以下示例计算质数集两次。每次计算都将结果存储在 std::bitset 对象中。该示例首先串行计算集合，然后并行计算集合。示例还控制台输出了进行两种计算所需的时间。

此示例使用 concurrency::parallel_for 算法和 combinable 对象来生成线程本地集。然后使用 concurrency::combinable::combine_each 方法将线程本地集组合成最终集。

// parallel-combine-primes.cpp
// compile with: /EHsc
#include <windows.h>
#include <ppl.h>
#include <bitset>
#include <iostream>using namespace concurrency;
using namespace std;// Calls the provided work function and returns the number of milliseconds 
// that it takes to call that function.
template <class Function>
__int64 time_call(Function&& f)
{__int64 begin = GetTickCount();f();return GetTickCount() - begin;
}// Determines whether the input value is prime.
bool is_prime(int n)
{if (n < 2)return false;for (int i = 2; i < n; ++i){if ((n % i) == 0)return false;}return true;
}const int limit = 40000;int wmain()
{// A set of prime numbers that is computed serially.bitset<limit> primes1;// A set of prime numbers that is computed in parallel.bitset<limit> primes2;__int64 elapsed;// Compute the set of prime numbers in a serial loop.elapsed = time_call([&] {for(int i = 0; i < limit; ++i) {if (is_prime(i))primes1.set(i);}});wcout << L"serial time: " << elapsed << L" ms" << endl << endl;// Compute the same set of numbers in parallel.elapsed = time_call([&] {// Use a parallel_for loop and a combinable object to compute // the set in parallel. // You do not need to synchronize access to the set because the // combinable object provides a separate bitset object to each thread.combinable<bitset<limit>> working;parallel_for(0, limit, [&](int i) {if (is_prime(i))working.local().set(i);});// Merge each thread-local computation into the final result.working.combine_each([&](bitset<limit>& local) {primes2 |= local;});});wcout << L"parallel time: " << elapsed << L" ms" << endl << endl;
}

在四核电脑上运行结果如下:

serial time: 312 msparallel time: 78 ms

关键字：windows C++ 并行编程-使用 combinable 提高性能

本网仅为发布的内容提供存储空间，不对发表、转载的内容提供任何形式的保证。凡本网注明“来源：XXX网络”的作品，均转载自其它媒体，著作权归作者所有，商业转载请联系作者获得授权，非商业转载请注明出处。

我们尊重并感谢每一位作者，均已注明文章来源和作者。如因作品内容、版权或其它问题，请及时与我们联系，联系邮箱：809451989@qq.com，投稿邮箱：809451989@qq.com

责任编辑：