基于Mac M1[ARM64]环境下Docker部署大数据集群

注意:打开新环境需要手动同步环境变量

一 机器依赖(CentOS7)[重要]

yum -y install \
vim \
sudo \
net-tools.aarch64 \
nmap-ncat.aarch64 \
telnet \
openssh-server \
openssh-clients

初始化sshd文件,如果不初始化下边文件,sshd服务会启动报错:

ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
ssh-keygen -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key
ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key

二 基础环境规划

  1. 软件版本

    软件版本
    Hadoophadoop-2.10.1.tar.gz
    Hiveapache-hive-2.3.1-bin.tar.gz
    Kafkakafka_2.12-2.0.0.tgz
    Zookeeperapache-zookeeper-3.5.7-bin.tar.gz
    Hbasehbase-2.4.9-bin.tar.gz
    Javajdk-8u202-linux-arm64-vfp-hflt.tar.gz
    Scalascala-2.12.15.tgz
    Sparkspark-3.3.2-bin-without-hadoop.tgz
    Hudihudi-release-0.13.0.zip
    Dorisapache-doris-1.2.6-bin-arm64.tar.xz
    Flinkflink-1.16.0-bin-scala_2.12.tgz
    Clickhouseclickhouse-23.7.3.14-1
  2. 组件分布

    节点组件列表
    hadoop-node1namenode | dataname | resourcemanager | nodemanager | hive | spark | flink | fe be | clickhouse
    Hadoop-node2datanode |nodemanager | mysql | zookeeper | hive | hive.metastore | hbase | fe be | clickhouse
    Hadoop-node3datanode |nodemanager | kafka | hive | fe be |clichhouse
  3. 端口注册

  4. 各节点节点启动脚本整合

    • hadoop-node1 启动脚本

      ## env 
      source ~/.base_profile
      ## hdfs 
      sbin/hadoop-daemon.sh start namenode
      sbin/hadoop-daemon.sh start datanode
      ## yarn
      sbin/yarn-daemon.sh start resourcemanager
      sbin/yarn-daemon.sh start nodemanager
      ## doris
      fe/bin/start_fe.sh --daemon
      be/bin/start_be.sh --daemo
      
    • Hadoop-node2 启动脚本

      ## env 
      source ~/.base_profile
      ## hdfs 
      sbin/hadoop-daemon.sh start datanode
      ## yarn
      sbin/yarn-daemon.sh start nodemanager
      ## mysql 
      systemctl start mariadb
      ## doris
      fe/bin/start_fe.sh --daemon
      be/bin/start_be.sh --daemo
      ## hbase 
      bin/start-hbase.sh
      ## zookeeper 
      bin/zkServer.sh start
      ## hive-metastore 
      nohup hive --service metastore >> /opt/data/hive/hive-metastore.log &
      
      
    • Hadoop-node3 启动脚本

      ## env 
      source ~/.base_profile
      ## hdfs 
      sbin/hadoop-daemon.sh start datanode
      ## yarn
      sbin/yarn-daemon.sh start nodemanager
      ## doris
      fe/bin/start_fe.sh --daemon
      be/bin/start_be.sh --daemo
      ## kafka 
      kafka-server-start.sh  -daemon config/server.properties &
      
  5. 启动3个节点docker容器[重要]

    节点1:

    docker run -itd \
    -h hadoop-node1 \
    --name=hadoop-node1 \
    --privileged=true \
    --network=gawyn-bridge \
    -v /Users/chavinking/gawyn/hadoop-node1:/opt \
    centos:centos7 \
    /sbin/init
    

    节点2:

    docker run -itd \
    -h hadoop-node2 \
    --name=hadoop-node2 \
    --privileged=true \
    --network=gawyn-bridge \
    -v /Users/chavinking/gawyn/hadoop-node2:/opt \
    centos:centos7 \
    /sbin/init
    

    节点3:

    docker run -itd \
    -h hadoop-node3 \
    --name=hadoop-node3 \
    --privileged=true \
    --network=gawyn-bridge \
    -v /Users/chavinking/gawyn/hadoop-node3:/opt \
    centos:centos7 \
    /sbin/init
    

三 配置环境变量

文件地址:/opt/runner/docker-env.sh

## Java
export JAVA_HOME=/opt/system/jdk1.8.0_202
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:.
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH

## scala
export SCALA_HOME=/opt/system/scala-2.12.15
export PATH=$SCALA_HOME/bin:$PATH:.

## maven profile
export MAVEN_HOME=/opt/system/maven-3.5.0
export PATH=$MAVEN_HOME/bin:$PATH:.

## Hadoop Env
export HADOOP_HOME=/opt/system/hadoop-2.10.1
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

## ZooKeeper Env
export ZOOKEEPER_HOME=/opt/system/zookeeper-3.5.7
export PATH=$PATH:$ZOOKEEPER_HOME/bin:.

## kafka Env
export KAFKA_HOME=/opt/system/kafka-2.12-2.0.0
export PATH=$PATH:$KAFKA_HOME/bin:.

## Hive Env
export HIVE_HOME=/opt/system/hive-2.3.1
export PATH=$PATH:$HIVE_HOME/bin:.

## Hbase Env 
export HBASE_HOME=/opt/system/hbase-2.4.9
export PATH=$PATH:$HBASE_HOME/bin:.

## Spark Env 
export SPARK_HOME=/opt/system/spark-3.3.2
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin:.

## Doris Env 
export DORIS_HOME=/opt/system/doris-1.2.6
export PATH=$PATH:$DORIS_HOME/fe/bin:$DORIS_HOME/be/bin:.

## Init Env
sysctl -w vm.max_map_count=2000000
ulimit -n 65536

将source /opt/runner/docker-env.sh添加到~/.bash_profile 文件。

四 配置ssh授信

  1. 为centos7用户设置密码

    passwd root
    
  2. yum安装ssh服务端和客户端依赖,见步骤1

  3. 启动sshd服务

    /usr/sbin/sshd 
    

    启动报错:

    [root@hadoop-node2 ~]# /usr/sbin/sshd

    Could not load host key: /etc/ssh/ssh_host_rsa_key

    Could not load host key: /etc/ssh/ssh_host_ecdsa_key

    Could not load host key: /etc/ssh/ssh_host_ed25519_key

    sshd: no hostkeys available – exiting.

    解决方案:

    ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key

    ssh-keygen -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key

    ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key

  4. 检查sshd是否启动

    % ps -ef|grep sshd
    root       145     1  0 06:55 ?        00:00:00 /usr/sbin/sshd
    
    % telnet 127.0.0.1 22
    Trying 127.0.0.1...
    Connected to 127.0.0.1.
    Escape character is '^]'.
    SSH-2.0-OpenSSH_7.4
    
  5. 节点间同步ssh授信文件,每个节点分别执行

    % ssh-keygen -t rsa
    % ssh-copy-id hadoop-node1
    % ssh hadoop-node1 date
    % ssh-copy-id hadoop-node2
    % ssh hadoop-node2 date
    % ssh-copy-id hadoop-node3
    % ssh hadoop-node3 date
    

五 部署Hadoop分布式集群

  1. 下载软件包并且解压软件

  2. 编辑hadoop配置文件,需要编辑配置文件如下:

    HDFS配置文件:

    etc/hadoop/hadoop-env.sh

    etc/hadoop/core-site.xml

    etc/hadoop/hdfs-site.xml

    etc/haoop/slaves

    YARN配置文件:

    etc/hadoop/yarn-env.sh

    etc/hadoop/yarn-site.xml

    etc/haoop/slaves

    MapReduce配置文件:

    etc/hadoop/mapred-env.sh

    etc/hadoop/mapred-site.xml[废弃]

  3. 编辑后的配置文件内容如下:

    • etc/hadoop/core-site.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hadoop-node1:8020</value>
        </property>
    
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/opt/data/hdfs</value>
        </property>
    
        <property>
            <name>fs.trash.interval</name>
            <value>7000</value>
        </property>
    </configuration>
    
    • etc/hadoop/hdfs-site.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>hadoop-node1:50090</value>
        </property>
    </configuration>
    
    • etc/hadoop/yarn-site.xml
    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
    		<!-- Site specific YARN configuration properties -->
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
    
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop-node1</value>
        </property>
    
        <property>
            <name>yarn.log-aggregation-enable</name>
            <value>true</value>
        </property>
    
        <property>
            <name>yarn.log-aggregation.retain-seconds</name>
            <value>600000</value>
        </property>
    </configuration>
    
    • etc/hadoop/slaves
    hadoop-node1
    hadoop-node2
    hadoop-node3
    
    • 配置Java环境变量
    etc/hadoop/hadoop-env.sh
    etc/hadoop/yarn-env.sh
    etc/hadoop/mapred-env.sh
    
    • 创建数据目录
    /opt/data/hdfs
    
    • 同步hadoop-node1节点文件到其他节点
    % scp -r * hadoop-node2:/opt/
    % scp -r * hadoop-node3:/opt/
    
  4. 格式化HDFS

    % hdfs namenode -format
    
  5. 启动Hadoop服务

    hdfs:
        sbin/hadoop-daemon.sh start|stop namenode
        sbin/hadoop-daemon.sh start|stop datanode
    yarn:
        sbin/yarn-daemon.sh start|stop resourcemanager
        sbin/yarn-daemon.sh start|stop nodemanager
    

六 部署MySQL服务

  1. 以yum方式简易安装

    % yum -y install mariadb mariadb-server
    
  2. 启动mysql服务

    % systemctl start mariadb
    

    容器内启动服务报错:

    报错:

    systemctl status mariadb

    Failed to get D-Bus connection: No such file or directory

    解决方案:

    vim ~/Library/Group\ Containers/group.com.docker/settings.json

    修改:“deprecatedCgroupv1”: false 为 “deprecatedCgroupv1”: true

    然后重启服务

七 部署Zookeeper服务

  1. 下载软件并解压到安装目录,并且创建数据目录/opt/data/zkdata,配置环境变量

  2. 配置zookeeper配置文件

    # conf/zoo.cfg
    dataDir=/opt/data/zkdata
    clientPort=2181
    
  3. 启动zookeeper

    bin/zkServer.sh start
    

八 部署Kafka服务

  1. 下载软件并解压到安装目录,并且创建数据目录/opt/data/kafka-logs,配置环境变量

  2. 配置kafka配置文件

    # config/server.properties
    broker.id=0
    listeners=PLAINTEXT://hadoop-node3:9092
    log.dirs=/opt/data/kafka-logs
    zookeeper.connect=hadoop-node2:2181
    
  3. 启动kafka服务

    kafka-server-start.sh  -daemon config/server.properties &
    
  4. kafka常用命令

    创建topic:
    kafka-topics.sh --create --zookeeper hadoop-node2:2181 --replication-factor 1 --partitions 1 --topic test
    查看所有topic:
    kafka-topics.sh --list --zookeeper hadoop-node2:2181
    启动producer:
    kafka-console-producer.sh --broker-list hadoop-node3:9092 --topic test
    启动consumer:
    kafka-console-consumer.sh --bootstrap-server hadoop-node3:9092 --from-beginning --topic test
    kafka-console-consumer.sh --zookeeper hadoop-node2:2181 --topic test --from-beginning
    

九 部署Hive服务

  1. 下载软件并解压到安装目录,配置环境变量

  2. 拷贝mysql jdbc依赖jar包到hive安装目录lib/文件夹内

  3. 配置Hive配置文件

    • hive-env.sh

    cp hive-env.sh.template hive-env.sh

    HADOOP_HOME=/opt/system/hadoop-2.10.1
    export HIVE_CONF_DIR=/opt/system/hive-2.3.1/conf
    
    • hive-site.xml

    cp hive-default.xml.template hive-site.xml

    <!-- hive metastore config -->
    		<property>
        <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:mysql://hadoop-node2:3306/hive?createDatabaseIfNotExist=true</value>
            <description>JDBC connect string for a JDBC metastore</description>
        </property>
    
        <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.jdbc.Driver</value>
            <description>Driver class name for a JDBC metastore</description>
        </property>
    
        <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>root</value>
            <description>username to use against metastore database</description>
        </property>
    
        <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>mysql</value>
            <description>password to use against metastore database</description>
        </property>
    
    <!-- hive warehouse dir -->
        <property>
            <name>hive.metastore.warehouse.dir</name>
            <value>/user/root/warehouse</value>
            <description>location of default database for the warehouse</description>
        </property>
    
    <!-- java.io.tmpdir -->
       <property>
            <name>system:java.io.tmpdir</name>
            <value>/opt/data/hive/tmp</value>
        </property>
    
        <property>
            <name>hive.exec.local.scratchdir</name>
            <value>/opt/data/hive/tmp/${user.name}</value>
            <description>Local scratch space for Hive jobs</description>
        </property>
    
    <!-- hive metastore config -->
        <property>
            <name>hive.metastore.port</name>
            <value>9083</value>
            <description>Hive metastore listener port</description>
        </property>
    
  4. 创建hive仓库hdfs目录与hive元数据库并初始化元数据库

    % hdfs dfs -mkdir /user/root/warehouse
    % mkdir /opt/data/hive/tmp
    
    mysql> create database hive;
    mysql> grant all privileges on *.* to 'root'@'hadoop-node2' identified by 'mysql' with grant option;
    mysql> flush privileges;
    # bin/schematool -initSchema -dbType mysql
    
  5. 启动hive metastore服务

    % nohup hive --service metastore >> /opt/data/hive/hive-metastore.log &
    

十 部署Hbase服务

  1. 下载软件并解压到安装目录,配置环境变量

  2. 配置Hbase配置文件

    • hbase-env.sh
    export JAVA_HOME=/opt/system/jdk1.8.0_202
    export HADOOP_HOME=/opt/system/hadoop-2.10.1
    export HBASE_MANAGES_ZK=false
    
    • hbase-site.xml
        <property>
            <name>hbase.rootdir</name>
            <value>hdfs://hadoop-node1:8020/user/root/hbase</value>
        </property>
        <property>
            <name>hbase.cluster.distributed</name>
            <value>true</value>
        </property>
        <property>
            <name>hbase.zookeeper.quorum</name>
            <value>hadoop-node2</value>
        </property>
        <property>
            <name>hbase.zookeeper.property.clientPort</name>
            <value>2181</value>
        </property>
        <property>
            <name>hbase.tmp.dir</name>
            <value>/opt/data/hbase/tmp</value>
        </property>
        <property>
            <name>zookeeper.znode.parent</name>
            <value>/hbase</value>
        </property>
        <property>
            <name>hbase.unsafe.stream.capability.enforce</name>
            <value>false</value>
        </property>
    
    • regionservers
    hadoop-node2
    
  3. 创建目录

    % hdfs dfs -mkdir /user/root/hbase
    % mkdir -p /opt/data/hbase/tmp
    
  4. 启动hbase服务

    % bin/start-hbase.sh
    
  5. 测试hbase集群

    % hbase shell
    hbase:001:0> status
    1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
    Took 1.3384 seconds
    hbase:002:0> create 'testtable','colfaml'
    Created table testtable
    Took 0.7568 seconds
    => Hbase::Table - testtable
    hbase:003:0>  list 'testtable'
    TABLE
    testtable
    1 row(s)
    Took 0.0390 seconds
    => ["testtable"]
    hbase:004:0> put 'testtable','myrow-1','colfaml:q1','value-1'
    Took 0.3415 seconds
    hbase:005:0> put 'testtable','myrow-2','colfaml:q2','value-2'
    Took 0.0067 seconds
    hbase:006:0> scan 'testtable'
    ROW                                                          COLUMN+CELL
     myrow-1                                                     column=colfaml:q1, timestamp=2023-08-08T06:14:14.685, value=value-1
     myrow-2                                                     column=colfaml:q2, timestamp=2023-08-08T06:14:19.278, value=value-2
    2 row(s)
    Took 0.0372 seconds
    hbase:007:0> get 'testtable','myrow-1'
    COLUMN                                                       CELL
     colfaml:q1                                                  timestamp=2023-08-08T06:14:14.685, value=value-1
    1 row(s)
    Took 0.0424 seconds
    

十一 部署Spark服务

  1. 下载软件并解压到安装目录,执行编译

    mvn clean package -DskipTests -Pyarn -Phadoop-2 -Dhadoop.version=2.10.1 -Phive -Phive-thriftserver 
    
  2. 配置环境变量,配置spark-env.sh文件[without-hadoop版本配置参数]

    export JAVA_HOME=/opt/system/jdk1.8.0_202
    export HADOOP_CONF_DIR=/opt/system/hadoop-2.10.1/etc/hadoop
    export YARN_CONF_DIR=/opt/system/hadoop-2.10.1/etc/hadoop
    export SPARK_DIST_CLASSPATH=$(hadoop classpath)
    export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
    
  3. 拷贝hive配置文件到spark配置目录

    cp /opt/system/hive-2.3.1/conf/hive-site.xml /opt/system/spark-3.3.2/conf/hive-site.xml
    
  4. 测试spark

    % spark-shell
    
    % spark-sql
    
  • 编译遇到问题

    • 问题1

      [ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]

      编译maven版本比软件pom.xml文件指定版本低,调整pom文件中maven版本,重新编译。

十二 部署Flink服务

  1. 下载软件并解压到安装目录,配置环境变量

  2. 配置flink配置文件

    taskmanager.numberOfTaskSlots: 4
    
  3. 测试flink

十三 部署Doris服务

  1. 下载软件并解压到安装目录,配置环境变量

  2. FE配置

    • 创建元数据目录

      % mkdir -p /opt/data/doris/doris-meta
      % mkdir -p /opt/data/doris/fe-log
      
    • 编辑配置文件[同步各个节点]

      LOG_DIR = /opt/data/doris/fe-log
      meta_dir = /opt/data/doris/doris-meta
      priority_networks = hadoop-node1-ip/24
      ## FE涉及端口号
      http_port = 18030
      edit_log_port = 19010
      rpc_port = 19020
      query_port = 19030
      
    • 启动FE

      % fe/bin/start_fe.sh --daemon
      
    • FE添加BE

      % mysql -hhadoop-node1 -uroot -p -P19030
      mysql > ALTER SYSTEM ADD BACKEND "hadoop-node1:19050";
      mysql > ALTER SYSTEM ADD BACKEND "hadoop-node2:19050";
      mysql > ALTER SYSTEM ADD BACKEND "hadoop-node3:19050";
      mysql > SHOW PROC '/frontends';
      mysql > SHOW PROC '/backends';
      
  3. BE配置

    • 创建数据目录

      % mkdir -p /opt/data/doris/doris-data
      % mkdir -p /opt/data/doris/be-log
      
    • 编辑配置文件[同步各个节点]

      PPROF_TMPDIR="/opt/data/doris/be-log/"
      priority_networks = hadoop-node1-ip/24
      storage_root_path = /opt/data/doris/doris-data
      ## BE涉及端口号
      be_port = 19060
      webserver_port = 18040
      heartbeat_service_port = 19050
      brpc_port = 18060
      
    • 启动BE

      • 添加环境变量

      sysctl -w vm.max_map_count=2000000

      ulimit -n 65536

      • 启动指令
      % be/bin/start_be.sh --daemo
      
  4. FE库容3个节点

    • 注册节点

      mysql > ALTER SYSTEM ADD FOLLOWER "hadoop-node2:19010";
      mysql > ALTER SYSTEM ADD FOLLOWER "hadoop-node3:19010";
      
    • 启动节点

      % fe/bin/start_fe.sh --helper hadoop-node1:19010 --daemon
      

十四 部署Clickhouse服务

  1. 安装Clickhouse

    yum install -y yum-utils
    yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
    yum clean all 
    yum makecache
    yum install -y clickhouse-server clickhouse-client
    
  2. 启动单机版Clickhouse

    sudo /etc/init.d/clickhouse-server start
    clickhouse-client # or "clickhouse-client --password" if you set up a password.
    
    sudo -u 'clickhouse' /usr/bin/clickhouse-server \
    --config-file /etc/clickhouse-server/config.xml \
    --pid-file /var/run/clickhouse-server/clickhouse-server.pid \
    --daemon
    
  3. Clickhouse集群配置

    • 3.1 集群各个机器修改配置文件config.xml

      vim /etc/clickhouse-server/config.xml

      <http_port>8123</http_port>
      <tcp_port>9000</tcp_port>
      <mysql_port>9004</mysql_port>
      <postgresql_port>9005</postgresql_port>
      <interserver_http_port>9009</interserver_http_port>
      
      <listen_host>::1</listen_host>
      <path>/opt/data/clickhouse/</path>
      <tmp_path>/opt/data/clickhouse/tmp/</tmp_path>
      
    • 3.2 在每台机器的etc目录下新建metrika.xml文件,并且编辑如下内容:

      <yandex>
      
          <!-- cluster config:单副本配置 -->
          <clickhouse_remote_servers>
              <gawyn_cluster>
                  <!-- data shard one -->
                  <shard>
                      <internal_replication>true</internal_replication>
                      <replica>
                          <host>hadoop-node1</host>
                          <port>9123</port>
                          <user>default</user>
                          <password>clickhouse</password>
                      </replica>
                  </shard>
                  
                  <!-- data shard two -->
                  <shard>
                      <replica>
                          <internal_replication>true</internal_replication>
                          <host>hadoop-node2</host>
                          <port>9123</port>
                          <user>default</user>
                          <password>clickhouse</password>
                      </replica>
                  </shard>
                  
                  <!-- data shard three -->
                  <shard>
                      <internal_replication>true</internal_replication>
                      <replica>
                          <host>hadoop-node3</host>
                          <port>9123</port>
                          <user>default</user>
                          <password>clickhouse</password>
                      </replica>
                  </shard>
              </gawyn_cluster>
          </clickhouse_remote_servers>
      
          <!-- zookeeper config -->
          <zookeeper-servers>
              <node index="1">
                  <host>hadoop-node2</host>
                  <port>2181</port>
              </node>
          </zookeeper-servers>
          
          <!-- 本地节点副本名称 -->
          <macros>
              <layer>gawyn_cluster</layer>
              <!-- gawyn_cluster_node1/gawyn_cluster_node2/gawyn_cluster_node3 -->
              <replica>gawyn_cluster_nodeX</replica>
          </macros>
      
          <!-- 监听网络 允许任何地址访问 --> 
          <networks>
             <ip>::/0</ip>
          </networks>
      
          <!-- 数据压缩算法 --> 
          <clickhouse_compression>
              <case>
                  <min_part_size>10000000000</min_part_size>
                  <min_part_size_ratio>0.01</min_part_size_ratio>                                                                                                                                       
                  <method>lz4</method>
              </case>
          </clickhouse_compression>
      
      </yandex>
      
    • 3.3 每台机器分别配置users.xml

      <clickhouse>
          <!-- See also the files in users.d directory where the settings can be overridden. -->
      
          <!-- Profiles of settings. -->
          <profiles>
              <!-- Default settings. -->
              <default>
              </default>
      
              <!-- Profile that allows only read queries. -->
              <readonly>
                  <readonly>1</readonly>
              </readonly>
          </profiles>
      
          <!-- Users and ACL. -->
          <users>
              <!-- If user name was not specified, 'default' user is used. -->
              <default>
                  <!-- See also the files in users.d directory where the password can be overridden.
      
                       Password could be specified in plaintext or in SHA256 (in hex format).
      
                       If you want to specify password in plaintext (not recommended), place it in 'password' element.
                       Example: <password>qwerty</password>.
                       Password could be empty.
      
                       If you want to specify SHA256, place it in 'password_sha256_hex' element.
                       Example: <password_sha256_hex>65e84be33532fb784c48129675f9eff3a682b27168c0ea744b2cf58ee02337c5</password_sha256_hex>
                       Restrictions of SHA256: impossibility to connect to ClickHouse using MySQL JS client (as of July 2019).
      
                       If you want to specify double SHA1, place it in 'password_double_sha1_hex' element.
                       Example: <password_double_sha1_hex>e395796d6546b1b65db9d665cd43f0e858dd4303</password_double_sha1_hex>
      
                       If you want to specify a previously defined LDAP server (see 'ldap_servers' in the main config) for authentication,
                        place its name in 'server' element inside 'ldap' element.
                       Example: <ldap><server>my_ldap_server</server></ldap>
      
                       If you want to authenticate the user via Kerberos (assuming Kerberos is enabled, see 'kerberos' in the main config),
                        place 'kerberos' element instead of 'password' (and similar) elements.
                       The name part of the canonical principal name of the initiator must match the user name for authentication to succeed.
                       You can also place 'realm' element inside 'kerberos' element to further restrict authentication to only those requests
                        whose initiator's realm matches it.
                       Example: <kerberos />
                       Example: <kerberos><realm>EXAMPLE.COM</realm></kerberos>
      
                       How to generate decent password:
                       Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
                       In first line will be password and in second - corresponding SHA256.
      
                       How to generate double SHA1:
                       Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha1sum | tr -d '-' | xxd -r -p | sha1sum | tr -d '-'
                       In first line will be password and in second - corresponding double SHA1.
                  -->
                  <password></password>
      
                  <!-- List of networks with open access.
      
                       To open access from everywhere, specify:
                          <ip>::/0</ip>
      
                       To open access only from localhost, specify:
                          <ip>::1</ip>
                          <ip>127.0.0.1</ip>
      
                       Each element of list has one of the following forms:
                       <ip> IP-address or network mask. Examples: 213.180.204.3 or 10.0.0.1/8 or 10.0.0.1/255.255.255.0
                           2a02:6b8::3 or 2a02:6b8::3/64 or 2a02:6b8::3/ffff:ffff:ffff:ffff::.
                       <host> Hostname. Example: server01.clickhouse.com.
                           To check access, DNS query is performed, and all received addresses compared to peer address.
                       <host_regexp> Regular expression for host names. Example, ^server\d\d-\d\d-\d\.clickhouse\.com$
                           To check access, DNS PTR query is performed for peer address and then regexp is applied.
                           Then, for result of PTR query, another DNS query is performed and all received addresses compared to peer address.
                           Strongly recommended that regexp is ends with $
                       All results of DNS requests are cached till server restart.
                  -->
                  <networks>
                      <ip>::/0</ip>
                  </networks>
      
                  <!-- Settings profile for user. -->
                  <profile>default</profile>
      
                  <!-- Quota for user. -->
                  <quota>default</quota>
      
                  <!-- User can create other users and grant rights to them. -->
                  <!-- <access_management>1</access_management> -->
              </default>
          </users>
      
          <!-- Quotas. -->
          <quotas>
              <!-- Name of quota. -->
              <default>
                  <!-- Limits for time interval. You could specify many intervals with different limits. -->
                  <interval>
                      <!-- Length of interval. -->
                      <duration>3600</duration>
      
                      <!-- No limits. Just calculate resource usage for time interval. -->
                      <queries>0</queries>
                      <errors>0</errors>
                      <result_rows>0</result_rows>
                      <read_rows>0</read_rows>
                      <execution_time>0</execution_time>
                  </interval>
              </default>
          </quotas>
      </clickhouse>
      
    • 3.4 启动clickhouse服务

      创建目录:

      mdkir -p /opt/data/clickhouse/tmp
      

      启动服务:

      sudo /etc/init.d/clickhouse-server start
      
    • 3.5 客户端登陆

      clienthouse-client
      

十五 编译Hudi

  1. 从github下载hudi源码

  2. 环境准备

    • Java&maven环境

      ~ % echo $JAVA_HOME

      /opt/system/jdk1.8.0_202
      % echo $MAVEN_HOME

      /opt/system/maven-3.5.0

    • 添加下面Kafka依赖到maven仓库

      common-config-5.3.4.jar
      common-utils-5.3.4.jar
      kafka-avro-serializer-5.3.4.jar
      kafka-schema-registry-client-5.3.4.jar

  3. 执行Hudi编译&验证是否编译成功

    mvn clean package -DskipTests \
    -Dspark3.3 -Dscala-2.12	\
    -Dflink1.16 -Dscala-2.12 \
    -Dhadoop.version=2.10.1 \
    -Pflink-bundle-shade-hive3
    

    执行以下命令验证是否编译成功:

    # hudi-cli/hudi-cli.sh
    

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/497659.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

C++:数据类型—字符(9)

什么是字符类型的数据&#xff1a;字符类型用于显示单个字符&#xff0c;比如你的键盘上随便一个字母&#xff0c;就是一个字母 语法&#xff1a;char 变量名 数据值 如&#xff1a;char ch a c和c中字符只占用一个字节 字符变量并不是把字母放到内存中&#xff0c;而是把字…

实战 | 微调训练TrOCR识别弯曲文本

导 读 本文主要介绍如何通过微调训练TrOCR实现弯曲文本识别。 背景介绍 TrOCR&#xff08;基于 Transformer 的光学字符识别&#xff09;模型是性能最佳的 OCR 模型之一。在我们之前的文章中&#xff0c;我们分析了它们在单行打印和手写文本上的表现。 TrOCR—基于Transforme…

java回溯算法笔记

回溯算法综述 回溯用于解决你层for循环嵌套问题&#xff0c;且不剪枝的回溯完全等于暴力搜索。 回溯算法模板https://blog.csdn.net/m0_73065928/article/details/137062099?spm1001.2014.3001.5501 组合问题 不能重复使用的组合问题&#xff08;startindex i1&#xff09…

Mac安装wget流程及异常解决(亲测有效)

目录 1.终端输入wget检查自己是否已经安装过wget,没有安装如下图2. 安装brew1&#xff09;点击brew官网&#xff1a;[官网网址](https://brew.sh)2&#xff09;将命令粘贴到终端&#xff0c;回车执行3&#xff09;输入sudo密码4&#xff09;系统开始自动安装brew&#xff0c;等…

C++的非类型模板参数与模板分离编译(模板显式实例化)

非类型模板参数与模板分离编译&#xff08;模板显式实例化&#xff09; 文章目录 非类型模板参数与模板分离编译&#xff08;模板显式实例化&#xff09;前言一、非类型模板参数二、模版分离编译1. 分离编译概念2. 模版的分离编译问题案例解决方法 总结 前言 ​ 本篇博客文章介…

【python分析实战】成本:揭示电商平台月度开支与成本结构占比 - 过于详细 【收藏】

重点关注本文思路&#xff0c;用python分析&#xff0c;方便大家实验复现&#xff0c;代码每次都用全量的&#xff0c;其他工具自行选择。 全文3000字&#xff0c;阅读10min&#xff0c;操作1小时 企业案例实战欢迎关注专栏 每日更新&#xff1a;https://blog.csdn.net/cciehl/…

深度学习语义分割篇——DeepLabV2原理详解篇

&#x1f34a;作者简介&#xff1a;秃头小苏&#xff0c;致力于用最通俗的语言描述问题 &#x1f34a;专栏推荐&#xff1a;深度学习网络原理与实战 &#x1f34a;近期目标&#xff1a;写好专栏的每一篇文章 &#x1f34a;支持小苏&#xff1a;点赞&#x1f44d;&#x1f3fc;、…

Java八股文(数据结构)

Java八股文の数据结构 数据结构 数据结构 请解释以下数据结构的概念&#xff1a;链表、栈、队列和树。 链表是一种线性数据结构&#xff0c;由节点组成&#xff0c;每个节点包含了指向下一个节点的指针&#xff1b; 栈是一种后进先出&#xff08;LIFO&#xff09;的数据结构&a…

linux中查看内存占用空间

文章目录 linux中查看内存占用空间 linux中查看内存占用空间 使用 df -h 查看磁盘空间 使用 du -sh * 查看每个目录的大小 注意这里是当前目录下的文件大小&#xff0c;查看系统的可以回到根目录 经过查看没有发现任何大的文件夹。 继续下面的步骤 如果您的Linux磁盘已满&a…

安全上网,防止上网被记录(v2ray实现加密通信)

近期听一位亲威说&#xff0c;她在公司休闲的时候上了哪个网站&#xff0c;浏览了过的网站IT部门的人都会知道&#xff0c;这是因为现在大多数网络设备&#xff0c;像路由与交换机都有记录访问网站地址记录功能&#xff0c;涉及还可以设置成记录到交互的内容。要想保密&#xf…

第4章.精通标准提示,引领ChatGPT精准输出

标准提示 标准提示&#xff0c;是引导ChatGPT输出的一个简单方法&#xff0c;它提供了一个具体的任务让模型完成。 如果你要生成一篇新闻摘要。你只要发送指示词&#xff1a;汇总这篇新闻 : …… 提示公式&#xff1a;生成[任务] 生成新闻文章的摘要&#xff1a; 任务&#x…

5.6 物联网RK3399项目开发实录-Android开发之(wulianjishu666)

物联网入门到项目实干案例下载&#xff1a; https://pan.baidu.com/s/1fHRxXBqRKTPvXKFOQsP80Q?pwdh5ug --------------------------------------------------------------------------------------------------------------------------------- U-Boot 使用 前言 RK U-B…

机器学习-生存分析:基于QHScrnomo模型的乳腺癌患者风险评估与个性化预测

一、引言 乳腺癌作为女性常见的恶性肿瘤之一&#xff0c;对女性健康构成威胁。随着医疗技术的不断进步&#xff0c;个性化医疗逐渐成为乳腺癌治疗的重要方向。通过深入研究乳腺癌患者的风险评估和个性化预测&#xff0c;可以帮助医生更准确地制定治疗方案&#xff0c;提高治疗效…

【R语言从0到精通】-1-下载R语言与R最基础内容

在本科&#xff0c;没有人教的情况下&#xff0c;艰难的自学了R语言&#xff0c;因此我想能出一个R语言系列教程&#xff0c;在帮助大家的同时&#xff0c;温故而知新&#xff0c;特别如果你是生物或者医学从业者&#xff0c;那本教程正好合适&#xff0c;因为我也是生物人&…

【计算机网络篇】数据链路层(4.1)可靠传输的相关概念

文章目录 &#x1f354;可靠传输的相关概念⭐分组丢失⭐分组失序⭐分组重复 &#x1f95a;注意 &#x1f354;可靠传输的相关概念 使用差错检测技术&#xff08;例如循环冗余校验CRC&#xff09;&#xff0c;接收方的数据链路层就可以检测出帧在传输过程中是否产生了误码&…

Yarn简介及Windows安装与使用指南

&#x1f31f; 前言 欢迎来到我的技术小宇宙&#xff01;&#x1f30c; 这里不仅是我记录技术点滴的后花园&#xff0c;也是我分享学习心得和项目经验的乐园。&#x1f4da; 无论你是技术小白还是资深大牛&#xff0c;这里总有一些内容能触动你的好奇心。&#x1f50d; &#x…

“预防儿童烧烫伤”科普安全课堂走进嘉鱼县第一小学

为提高嘉鱼县儿童烧烫伤安全意识、隐患识别能力以及突发应急处置能力&#xff0c;3月26日下午&#xff0c;在中国社会福利基金会烧烫伤关爱公益基金、嘉鱼县妇女联合会、嘉鱼县教育局的支持下&#xff0c;嘉鱼县蒲公英社会工作服务中心走进嘉鱼县第一小学开展预防儿童烧烫伤科普…

Unity2018发布安卓报错 Exception: Gradle install not valid

Unity2018发布安卓报错 Exception: Gradle install not valid Exception: Gradle install not valid UnityEditor.Android.GradleWrapper.Run (System.String workingdir, System.String task, System.Action1[T] progress) (at <c67d1645d7ce4b76823a39080b82c1d1>:0) …

探索智慧农业精准除草,基于高精度YOLOv5全系列参数【n/s/m/l/x】模型开发构建农田作物场景下杂草作物分割检测识别分析系统

智慧农业是未来的一个新兴赛道&#xff0c;随着科技的普及与落地应用&#xff0c;会有更加广阔的发展空间&#xff0c;关于农田作物场景下的项目开发实践&#xff0c;在我们前面的博文中也有很堵相关的实践&#xff0c;单大都是偏向于目标检测方向的&#xff0c;感兴趣可以自行…

QT布局管理和空间提升为和空间间隔

QHBoxLayout&#xff1a;按照水平方向从左到右布局&#xff1b; QVBoxLayout&#xff1a;按照竖直方向从上到下布局&#xff1b; QGridLayout&#xff1a;在一个网格中进行布局&#xff0c;类似于HTML的table&#xff1b; 基本布局管理类包括&#xff1a;QBoxLayout、QGridL…
最新文章