`
myten
  • 浏览: 131967 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

JavaSparkPi程序实现原理

阅读更多

在下载下来的spark里,有个样例程序叫做JavaSparkPi,大意是利用Spark的MapReduce函数求圆周率.

 

代码如下:

 

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.spark.examples;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;

import java.util.ArrayList;
import java.util.List;

/** 
 * Computes an approximation to pi
 * Usage: JavaSparkPi [slices]
 */
public final class JavaSparkPi {

  public static void main(String[] args) throws Exception {
    SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi");
    		sparkConf.setMaster("local");
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);

    int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2;
    int n = 3000000 * slices;
    List<Integer> l = new ArrayList<Integer>(n);
    for (int i = 0; i < n; i++) {
      l.add(i);
    }

    JavaRDD<Integer> dataSet = jsc.parallelize(l, slices);

    int count = dataSet.map(new Function<Integer, Integer>() {
      @Override
      public Integer call(Integer integer) {
        double x = Math.random() * 2 - 1;
        double y = Math.random() * 2 - 1;
        return (x * x + y * y < 1) ? 1 : 0;
      }
    }).reduce(new Function2<Integer, Integer, Integer>() {
      @Override
      public Integer call(Integer integer, Integer integer2) {
        return integer + integer2;
      }
    });

    System.out.println("Pi is roughly " + 4.0 * count / n);

    jsc.stop();
  }
}

 代码一开始构造了一个很大的集合.然后利用Map函数迭代,并随机采样坐标点.

 

实现背景几何解剖大致如下

取圆心x,y正负1区间为正方形,那么正方形面积为4.

取半径为1圆,圆心坐标为0,0.那么圆形面积为3.141........,也就是元周率.

 

代码开始随机采样坐标点,并判断坐标点是否在圆内.

    double x = Math.random() * 2 - 1;
        double y = Math.random() * 2 - 1;
        return (x * x + y * y < 1) ? 1 : 0;

随机构造X,Y,Math.random只会返回小于1的数,所以后面的乘以2减去1,必然是在正方形内.

x*x+y*y=1反映的是坐标是否在圆周上.那么<1自然就是判断是否在圆内部了.

假设在圆内,就返回1,否则返回0,结合后面的reduce就可以得到总共有多少个点是在圆内的.

 

已知合计n个采样点,共count个在圆内的点.

那么count/n就可以得出 采样点在圆内的合计数 所在 总共采样点个数的比例.利用这个比例去乘以正方形面积.就可以得到元周率近似值.

 

结论,当采样数越大,得出的圆周率越精确.

 

 

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics