Java 的 fork-join 框架实例备忘

Java 7 首次引入了 fork/join 框架,但一直未曾直接尝试. 而且基本上也很少在实际项目中直接写 fork-join 的代码,在我们使用第三方组件时倒是间接会接触到 fork/join 框架。譬如 Akka 的 fork-join-executor, sbt 执行测试用例时也是默认 fork/join 并发执行。fork-join 可以帮助我们把计算任务粒度细化,并更有效的利用多 CPU 内核。

fork-join 与 map-reduce 有些相妨,在 Java 7 时代我其实是忽视了它的存在。目今正在了解 Java 8 的  parallelStream 时,因为它的底层实现也是 fork/join, 所以有兴致去稍加体验一下。fork/join 的算法简单来讲就是递归对半去细化计算任务,及到不能细化时由多内核(线程)去计算被拆分的任务,最后反方向把结果汇总。

下面是从 《Java 8 IN ACTION》中截的一个说明 fork/join 的处理过程

fork-join-framework

以下是代码演示实现,更有助于理解 fork/join 是如何工作的

fork/join 的任务要继承算 RecursiveTask<T>,并在 compute() 方法同时决定任务的细化粒度和如何合并结果.

leftTask.fork(); 将把任务委派给新的线程执行
rightTask.compute(); 将重用本线程完成进一步任务,因为没必要把当前线程释放再取用. 写成 rightTask.fork().join(); 也能出正确的结果

注: 以上代码只是一个对 fork/join 过程的演示,在该代码的 fork/join 并未能提升计算性能。因为每个计算任务并不耗时,拆分任务(fork) 和合并计算结果(join) ,以及创建使用多线程这些辅助过程本身都重于实际的计算任务。所以 fork/join 的目的是要拆分耗时的任务,充分发挥多内核的优势来更有效的完成整体计算。

看下输出结果:

Summation from 18750 to 24999, calculated by thread ForkJoinPool-1-worker-4
Summation from 6250 to 12499, calculated by thread ForkJoinPool-1-worker-0
Summation from 93750 to 99999, calculated by thread ForkJoinPool-1-worker-1
Summation from 87500 to 93749, calculated by thread ForkJoinPool-1-worker-7
Summation from 56250 to 62499, calculated by thread ForkJoinPool-1-worker-6
Summation from 43750 to 49999, calculated by thread ForkJoinPool-1-worker-2
Summation from 81250 to 87499, calculated by thread ForkJoinPool-1-worker-5
Summation from 68750 to 74999, calculated by thread ForkJoinPool-1-worker-3
Summation from 37500 to 43749, calculated by thread ForkJoinPool-1-worker-2
Summation from 75000 to 81249, calculated by thread ForkJoinPool-1-worker-1
Summation from 50000 to 56249, calculated by thread ForkJoinPool-1-worker-7
Summation from 0 to 6249, calculated by thread ForkJoinPool-1-worker-0
Summation from 12500 to 18749, calculated by thread ForkJoinPool-1-worker-4
Summation from 25000 to 31249, calculated by thread ForkJoinPool-1-worker-5
Summation from 31250 to 37499, calculated by thread ForkJoinPool-1-worker-2
Summation from 62500 to 68749, calculated by thread ForkJoinPool-1-worker-3
Final result: 5000050000, CPU cores: 8

fork/join 使用的是 ForkJoinPool 线程池,默认数量为机器的逻辑内核数即 Runtime.getRuntime().availableProcessors() 的值,我的机器是 8 核的。从输出中看到了任务被分拆为每次计算 10000 个数字,分别于线程池中的 ForkJoinPool-1-workerX(0-7) 来执行。

fork/join 的关键就是如何拆分任务和怎么把每个计算结果合并。

未例中可以启用注释掉的代码

看起来似乎是完全一样的,但执行后的输出却令我有些迷惑

Summation from 0 to 6249, calculated by thread ForkJoinPool-1-worker-3
Summation from 6250 to 12499, calculated by thread ForkJoinPool-1-worker-1
Summation from 12500 to 18749, calculated by thread ForkJoinPool-1-worker-2
Summation from 18750 to 24999, calculated by thread ForkJoinPool-1-worker-2
Summation from 25000 to 31249, calculated by thread ForkJoinPool-1-worker-2
Summation from 31250 to 37499, calculated by thread ForkJoinPool-1-worker-1
Summation from 37500 to 43749, calculated by thread ForkJoinPool-1-worker-1
Summation from 43750 to 49999, calculated by thread ForkJoinPool-1-worker-1
Summation from 50000 to 56249, calculated by thread ForkJoinPool-1-worker-1
Summation from 56250 to 62499, calculated by thread ForkJoinPool-1-worker-1
Summation from 62500 to 68749, calculated by thread ForkJoinPool-1-worker-1
Summation from 68750 to 74999, calculated by thread ForkJoinPool-1-worker-1
Summation from 75000 to 81249, calculated by thread ForkJoinPool-1-worker-1
Summation from 81250 to 87499, calculated by thread ForkJoinPool-1-worker-1
Summation from 87500 to 93749, calculated by thread ForkJoinPool-1-worker-1
Summation from 93750 to 99999, calculated by thread ForkJoinPool-1-worker-1
Final result: 5000050000, CPU cores: 8

基本只有 2-3 个线程参与计算,而不像前面的所有线程,这和顺序有关系了,必须是先 fork, compute, 再 join, 即基本过程是

leftTask.fork();
rightTask.compute();
leftTask.join();

类别: Java/JEE. 标签: , . 阅读(142). 订阅评论. TrackBack.

Leave a Reply

Be the First to Comment!

avatar