头歌平台——大数据技术——上机

有问题自行解决,本文档仅用于记录本人课程学习的过程


大数据上机 请先阅读注意事项

md文档用户可以按住ctrl + 鼠标左键跳转至注意事项 ,Pdf用户请直接点击


文章目录

  • 大数据上机 请先阅读注意事项
    • @[toc]
    • **注意事项 此项必看**
    • 大数据技术概述
      • 大数据应用
    • Linux 系统的安装和使用
      • Linux 操作系统
        • Linux 初体验
        • Linux 常用命令
        • Linux 查询命令帮助语句
    • Hadoop 的安装和使用
      • 章节测验
    • HDFS
      • 小节
        • 第一题
      • 章节
        • 第一题
        • 第二题
        • 第三题
        • 第四题
        • 第五题
        • 第六题
        • 第七题
        • 第八题
        • 第九题
        • 第十题
    • HBASE
      • 小节 2题
        • 第一题
        • 第二题
      • 小节 5题
        • 第一题
        • 第二题
        • 第三题
        • 第四题
        • 第五题
      • 章节
        • 第一题
        • 第二题
        • 第三题
        • 第四题
        • 第五题
    • NoSql
      • 小节 4题
        • 第一题
        • 第二题
        • 第三题
        • 第四题
      • 小节 3题
        • 第一题
        • 第二题
        • 第三题
    • MapReduce
      • 小节
      • 章节
        • 第一题
        • 第二题
        • 第三题
    • Hive(本章存在一定机会报错,具体解决办法见下文)
      • 小节
      • 小节
      • 章节
    • Spark(有逃课版,嫌勿用)
      • 小节 2题
        • 第一题
        • 第二题
      • 小节 (一个逃课版,一个走过程)
      • 章节 3题 (现只更新逃课版)
        • 第一题
        • 第二题
        • 第三题
    • 已发现报错:

注意事项 此项必看

  • 注释一定要看!

  • 个别题目选择逃课做法(标题中会声明),嫌勿用

  • 相关题目中涉及的服务请自行启动

  • 代码执行过程中出现问题请自行解决,或者释放资源重新开始

  • 评测之前请先自测运行

  • 启动 h a d o o p hadoop hadoop 服务时,尽量使用 s t a r t − a l l . s h start-all.sh startall.sh 命令,尤其是 MapReduce章节 只可以使用 s t a r t − a l l . s h start-all.sh startall.sh ,否则运行不出结果

  • 针对一些建表语句请仔细认真看清楚需要复制的内容是什么,有些是多行一句,请务必准确复制并使用

  • 找题请查看你所使用阅读器提供的 目录

  • h i v e hive hive 章节存在可能出现的报错 给出了遇到相同报错的解决办法

  • 启动 Hadoop 、Zookeeper、HBase 服务

    zkServer.sh start
    start-dfs.sh
    start-hbase.sh
    
  • 启动 Hadoop 、hive 服务 尽量使用 s t a r t − a l l . s h start-all.sh startall.sh 启动 Hadoop

    start-all.sh
    hive --service metastore # 看到卡住在 “SLF4J”开头的 就终止一下(ctrl c) 之后执行下面的
    hive --service hiveserver2 # 这个也会卡 终止一下 按服务启动了认为就行 指令是网上搜的 觉得不对的绕行
    

大数据技术概述

大数据应用

选择题答案:D D D ABCD BCD


Linux 系统的安装和使用

Linux 操作系统

Linux 初体验
#!/bin/bash

#在以下部分写出完成任务的命令
#*********begin*********#
cd /
ls -a
#********* end *********#

Linux 常用命令
#!/bin/bash

#在以下部分写出完成任务的命令
#*********begin*********#
touch newfile
mkdir newdir
cp newfile newfileCpy
mv newfileCpy newdir
#********* end *********#

Linux 查询命令帮助语句
#!/bin/bash

#在以下部分写出完成任务的命令
#*********begin*********#
man fopen
#********* end *********#


Hadoop 的安装和使用

章节测验

root@educoder:~# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
root@educoder:~# hdfs dfs -mkdir -p /user/hadoop/test
root@educoder:~# hdfs dfs -put ~/.bashrc /user/hadoop/test
root@educoder:~# hdfs dfs -get /user/hadoop/test /app/hadoop

HDFS

小节

第一题
root@educoder:~# start-dfs.sh
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.io.IOUtils;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

/*************** Begin ***************/

public class HDFSUtils {

    public static void main(String[] args) {

        // HDFS通信地址
        String hdfsUri = "hdfs://localhost:9000";

        // HDFS文件路径
        String[] inputFiles = {"/a.txt", "/b.txt", "/c.txt"};

        // 输出路径
        String outputFile = "/root/result/merged_file.txt";

        // 创建Hadoop配置对象
        Configuration conf = new Configuration();
        try {
            // 创建Hadoop文件系统对象
            FileSystem fs = FileSystem.get(new Path(hdfsUri).toUri(), conf);

            // 创建输出文件
            OutputStream outputStream = fs.create(new Path(outputFile));

            // 合并文件内容
            for (String inputFile : inputFiles) {
                mergeFileContents(fs, inputFile, outputStream);
            }

            // 关闭流
            outputStream.close();

            // 关闭Hadoop文件系统
            fs.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

/*************** End ***************/

    private static void mergeFileContents(FileSystem fs, String inputFile, OutputStream outputStream) throws IOException {
        // 打开输入文件
        Path inputPath = new Path(inputFile);
        InputStream inputStream = fs.open(inputPath);

        // 拷贝文件内容
        IOUtils.copyBytes(inputStream, outputStream, 4096, false);

        // 写入换行符
        outputStream.write(System.lineSeparator().getBytes());

        // 关闭流
        inputStream.close();
    }
}


章节

root@educoder:~# start-dfs.sh
第一题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {

    /**
     * 判断路径是否存在
     */
    public static boolean test(Configuration conf, String path) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.exists(new Path(path));
    }

    /**
     * 复制文件到指定路径
     * 若路径已存在,则进行覆盖
     */
    public static void copyFromLocalFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path localPath = new Path(localFilePath);
        Path remotePath = new Path(remoteFilePath);
        /* fs.copyFromLocalFile 第一个参数表示是否删除源文件,第二个参数表示是否覆盖 */
        fs.copyFromLocalFile(false,true,localPath,remotePath);
        fs.close();
    }
 
    /**
     * 追加文件内容
     */
    public static void appendToFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path remotePath = new Path(remoteFilePath);
        /* 创建一个文件读入流 */
        FileInputStream in = new FileInputStream(localFilePath);
        /* 创建一个文件输出流,输出的内容将追加到文件末尾 */
        FSDataOutputStream out = fs.append(remotePath);
        /* 读写文件内容 */
        byte[] buffer = new byte[4096];
        int bytesRead = 0;
        while ((bytesRead = in.read(buffer)) > 0) {
            out.write(buffer, 0, bytesRead);
        }
        in.close();
        out.close();
        fs.close();

    }
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {

		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String localFilePath = "/root/test.txt";    // 本地路径
		String remoteFilePath = "/test.txt";    // HDFS路径
		String choice = "overwrite";    // 若文件存在则追加到文件末尾
		
		try {
			/* 判断文件是否存在 */
			Boolean fileExists = false;
			if (HDFSApi.test(conf, remoteFilePath)) {
				fileExists = true;
				System.out.println(remoteFilePath + " 已存在.");
			} else {
				System.out.println(remoteFilePath + " 不存在.");
			}
			/* 进行处理 */
			if ( !fileExists) { // 文件不存在,则上传
				HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);
				System.out.println(localFilePath + " 已上传至 " + remoteFilePath);
			} else if ( choice.equals("overwrite") ) {    // 选择覆盖
				HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);
				System.out.println(localFilePath + " 已覆盖 " + remoteFilePath);
			} else if ( choice.equals("append") ) {   // 选择追加
				HDFSApi.appendToFile(conf, localFilePath, remoteFilePath);
				System.out.println(localFilePath + " 已追加至 " + remoteFilePath);
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}
/*************** End ***************/
第二题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {
    
    /**
     * 下载文件到本地
     * 判断本地路径是否已存在,若已存在,则自动进行重命名
     */
    public static void copyToLocal(Configuration conf, String remoteFilePath, String localFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path remotePath = new Path(remoteFilePath);
        File localFile = new File(localFilePath);
        /* 如果文件名存在,自动重命名(在文件名后面加上 _0, _1 ...) */
        if (localFile.exists()) {
            // 如果文件已存在,则自动进行重命名
            int count = 0;
            String baseName = localFile.getName();
            String parentDir = localFile.getParent();
            String newName = baseName;
            do {
                count++;
                newName = baseName + "_" + count;
                localFile = new File(parentDir, newName);
            } while (localFile.exists());
        }
        
        // 下载文件到本地
        fs.copyToLocalFile(remotePath, new Path(localFile.getAbsolutePath()));
        fs.close();

    }
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String localFilePath = "/usr/local/down_test/test.txt";   // 本地路径
		String remoteFilePath = "/test.txt";   // HDFS路径
		
		try {
			HDFSApi.copyToLocal(conf,remoteFilePath,localFilePath);
			System.out.println("下载完成");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

第三题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {

    /**
     * 读取文件内容
     */
    public static void cat(Configuration conf, String remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path remotePath = new Path(remoteFilePath);
        FSDataInputStream in = fs.open(remotePath);
        BufferedReader reader = new BufferedReader(new InputStreamReader(in));

        String line;
        while ((line = reader.readLine()) != null) {
            System.out.println(line);
        }

        reader.close();
        in.close();
        fs.close();
    }
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String remoteFilePath = "/test.txt";    // HDFS路径
		
		try {
			System.out.println("读取文件: " + remoteFilePath);
			HDFSApi.cat(conf, remoteFilePath);
			System.out.println("\n读取完成");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/
第四题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
import java.text.SimpleDateFormat;
import java.util.Date;

/*************** Begin ***************/

public class HDFSApi {

    /**
     * 显示指定文件的信息
     */
    public static void ls(Configuration conf, String remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path remotePath = new Path(remoteFilePath);
        FileStatus[] fileStatuses = fs.listStatus(remotePath);

        for (FileStatus s : fileStatuses) {
            // 获取文件路径
            String path = s.getPath().toString();
            // 获取文件权限
            String permission = s.getPermission().toString();
            // 获取文件大小
            long fileSize = s.getLen();
            // 获取文件修改时间
            long modificationTime = s.getModificationTime();
            SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
            String modificationTimeStr = sdf.format(new Date(modificationTime));

            // 输出文件信息
            System.out.println("路径: " + path);
            System.out.println("权限: " + permission);
            System.out.println("时间: " + modificationTimeStr);
            System.out.println("大小: " + fileSize);
        }
        fs.close();

    }
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String remoteFilePath = "/";  // HDFS路径
		
		try {
			HDFSApi.ls(conf, remoteFilePath);
			System.out.println("\n读取完成");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

}

/*************** End ***************/

第五题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
import java.text.SimpleDateFormat;

/*************** Begin ***************/

public class HDFSApi {
    /**
     * 显示指定文件夹下所有文件的信息(递归)
     */
    public static void lsDir(Configuration conf, String remoteDir) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path dirPath = new Path(remoteDir);
        listFiles(fs,dirPath);
        fs.close();
    }    

    private static void listFiles(FileSystem fs, Path dirPath) throws IOException {
        FileStatus[] fileStatuses = fs.listStatus(dirPath);
        for (FileStatus status : fileStatuses) {
            if (status.isFile()) {
                printFileInfo(status);
            } else if (status.isDirectory()) {
                // 如果是目录,则递归处理
                listFiles(fs, status.getPath());
            }
        }
    }

    private static void printFileInfo(FileStatus status) {
        // 获取文件路径
        String path = status.getPath().toString();

        // 获取文件权限
        String permission = status.getPermission().toString();

        // 获取文件大小
        long fileSize = status.getLen();

        // 获取文件修改时间
        long modificationTime = status.getModificationTime();
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
        String modificationTimeStr = sdf.format(modificationTime);

        // 输出文件信息
        System.out.println("路径: " + path);
        System.out.println("权限: " + permission);
        System.out.println("时间: " + modificationTimeStr);
        System.out.println("大小: " + fileSize);
        System.out.println();
    } 
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String remoteDir = "/test";    // HDFS路径
		
		try {
			System.out.println("(递归)读取目录下所有文件的信息: " + remoteDir);
			HDFSApi.lsDir(conf,remoteDir);
			System.out.println("读取完成");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

第六题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {

    /**
     * 判断路径是否存在
     */
    public static boolean test(Configuration conf, String path) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.exists(new Path(path));
    }
	
    /**
     * 创建目录
     */
    public static boolean mkdir(Configuration conf, String remoteDir) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.mkdirs(new Path(remoteDir));
    }

    /**
     * 创建文件
     */
    public static void touchz(Configuration conf, String filePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        fs.create(new Path(filePath)).close();
    }
    
    /**
     * 删除文件
     */
    public static boolean rm(Configuration conf, String filePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.delete(new Path(filePath), false); // 第二个参数表示是否递归删除
    }
	
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String filePath = "/test/create.txt";    // HDFS 路径
		String remoteDir = "/test";    // HDFS 目录路径
		
		try {
			/* 判断路径是否存在,存在则删除,否则进行创建 */
			if ( HDFSApi.test(conf, filePath) ) {
				HDFSApi.rm(conf, filePath); // 删除
				System.out.println("删除路径: " + filePath);
			} else {
				if ( !HDFSApi.test(conf, remoteDir) ) { // 若目录不存在,则进行创建
					HDFSApi.mkdir(conf, remoteDir);
					System.out.println("创建文件夹: " + remoteDir);
				}
				HDFSApi.touchz(conf, filePath);
				System.out.println("创建路径: " + filePath);
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

第七题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {

    /**
     * 判断路径是否存在
     */
    public static boolean test(Configuration conf, String path) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.exists(new Path(path));
    }

    /**
     * 判断目录是否为空
     * true: 空,false: 非空
     */
    public static boolean isDirEmpty(Configuration conf, String remoteDir) throws IOException {
       FileSystem fs = FileSystem.get(conf);
        FileStatus[] fileStatuses = fs.listStatus(new Path(remoteDir));
        fs.close();
        return fileStatuses == null || fileStatuses.length == 0;
    }
	
    /**
     * 创建目录
     */
    public static boolean mkdir(Configuration conf, String remoteDir) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        boolean result = fs.mkdirs(new Path(remoteDir));
        fs.close();
        return result;
    }
    
    /**
     * 删除目录
     */
    public static boolean rmDir(Configuration conf, String remoteDir) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        boolean result = fs.delete(new Path(remoteDir), true); 
        fs.close();
        return result;
    }
	
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String remoteDir = "/dirTest";
		Boolean forceDelete = false;  // 是否强制删除
		
		try {
			/* 判断目录是否存在,不存在则创建,存在则删除 */
			if ( !HDFSApi.test(conf, remoteDir) ) {
				HDFSApi.mkdir(conf, remoteDir); // 创建目录
				System.out.println("创建目录: " + remoteDir);
			} else {
				if ( HDFSApi.isDirEmpty(conf, remoteDir) || forceDelete ) { // 目录为空或强制删除
					HDFSApi.rmDir(conf, remoteDir);
					System.out.println("删除目录: " + remoteDir);
				} else  { // 目录不为空
					System.out.println("目录不为空,不删除: " + remoteDir);
				}
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

第八题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {

    /**
     * 判断路径是否存在
     */
    public static boolean test(Configuration conf, String path) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.exists(new Path(path));
    }

    /**
     * 追加文本内容
     */
    public static void appendContentToFile(Configuration conf, String content, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
        Path path = new Path(remoteFilePath);
        if (!fs.exists(path)) {
            System.out.println("文件不存在: " + remoteFilePath);
            return;
        }
        FSDataOutputStream out = fs.append(path);
        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(out));
        writer.write(content);
        writer.newLine();
        writer.close();
        fs.close();
        System.out.println("已追加内容到文件末尾: " + remoteFilePath);
}
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String remoteFilePath = "/insert.txt";    // HDFS文件
		String content = "I love study big data"; // 文件追加内容

		try {
			/* 判断文件是否存在 */
			if ( !HDFSApi.test(conf, remoteFilePath) ) {
				System.out.println("文件不存在: " + remoteFilePath);
			} else {
                HDFSApi.appendContentToFile(conf, content, remoteFilePath);
                System.out.println("已追加内容到文件末尾" + remoteFilePath);
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

第九题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {
    /**
     * 删除文件
     */
    public static boolean rm(Configuration conf, String remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path path = new Path(remoteFilePath);
        if (!fs.exists(path)) {
            System.out.println("文件不存在: " + remoteFilePath);
            return false;
        }
        boolean deleted = fs.delete(path, false);
        fs.close();
        if (deleted) {
            System.out.println("已删除文件: " + remoteFilePath);
        } else {
            System.out.println("删除文件失败: " + remoteFilePath);
        }
        return deleted;

    }
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://localhost:9000");
		String remoteFilePath = "/delete.txt";    // HDFS 文件
		
		try {
			if ( HDFSApi.rm(conf, remoteFilePath) ) {
				System.out.println("文件删除: " + remoteFilePath);
			} else {
				System.out.println("操作失败(文件不存在或删除失败)");
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

第十题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {
    /**
     * 移动文件
     */
    public static boolean mv(Configuration conf, String remoteFilePath, String remoteToFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path srcPath = new Path(remoteFilePath);
        Path destPath = new Path(remoteToFilePath);
        
        // 检查源文件是否存在
        if (!fs.exists(srcPath)) {
            System.out.println("源文件不存在: " + remoteFilePath);
            return false;
        }
        
        // 移动文件
        boolean success = fs.rename(srcPath, destPath);
        fs.close();
        
        // 输出移动结果
        if (success) {
            System.out.println("文件移动成功: " + remoteFilePath + " -> " + remoteToFilePath);
        } else {
            System.out.println("文件移动失败: " + remoteFilePath + " -> " + remoteToFilePath);
        }
        return success;
    }
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://localhost:9000");
		String remoteFilePath = "/move.txt";    // 源文件HDFS路径
		String remoteToFilePath = "/moveDir/move.txt";    // 目的HDFS路径
		
		try {
			if ( HDFSApi.mv(conf, remoteFilePath, remoteToFilePath) ) {
				System.out.println("将文件 " + remoteFilePath + " 移动到 " + remoteToFilePath);
			} else {
					System.out.println("操作失败(源文件不存在或移动失败)");
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/


HBASE

小节 2题

第一题
zkServer.sh start
start-dfs.sh
start-hbase.sh

出现的警告或者提示不用管


hbase shell

create 'Student','S_No','S_Name','S_Sex','S_Age'
put 'Student', '2015001', 'S_Name', 'Zhangsan'
put 'Student', '2015001', 'S_Sex', 'male'
put 'Student', '2015001', 'S_Age', '23'
put 'Student', '2015002', 'S_Name', 'Lisi'
put 'Student', '2015002', 'S_Sex', 'male'
put 'Student', '2015002', 'S_Age', '24'
put 'Student', '2015003', 'S_Name', 'Mary'
put 'Student', '2015003', 'S_Sex', 'female'
put 'Student', '2015003', 'S_Age', '22'

create 'Course', 'C_No', 'C_Name', 'C_Credit'
put 'Course', '123001', 'C_Name', 'Math'
put 'Course', '123001', 'C_Credit', '2.0'
put 'Course', '123002', 'C_Name', 'Computer Science'
put 'Course', '123002', 'C_Credit', '5.0'
put 'Course', '123003', 'C_Name', 'English'
put 'Course', '123003', 'C_Credit', '3.0'

create 'SC','SC_Sno','SC_Cno','SC_Score'
put 'SC','sc001','SC_Sno','2015001'
put 'SC','sc001','SC_Cno','123001'
put 'SC','sc001','SC_Score','86'
put 'SC','sc002','SC_Sno','2015001'
put 'SC','sc002','SC_Cno','123003'
put 'SC','sc002','SC_Score','69'
put 'SC','sc003','SC_Sno','2015002'
put 'SC','sc003','SC_Cno','123002'
put 'SC','sc003','SC_Score','77'
put 'SC','sc004','SC_Sno','2015002'
put 'SC','sc004','SC_Cno','123003'
put 'SC','sc004','SC_Score','99'
put 'SC','sc005','SC_Sno','2015003'
put 'SC','sc005','SC_Cno','123001'
put 'SC','sc005','SC_Score','98'
put 'SC','sc006','SC_Sno','2015003'
put 'SC','sc006','SC_Cno','123002'
put 'SC','sc006','SC_Score','95'
第二题
root@educoder:~# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@educoder:~# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
root@educoder:~# start-hbase.sh
running master, logging to /app/hbase/logs/hbase-root-master-educoder.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
: running regionserver, logging to /app/hbase/logs/hbase-root-regionserver-educoder.out
: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
root@educoder:~# hbase shell

HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.13, r38bf65a22b7e9320f07aeb27677e4533b9a77ef4, Sun Feb 23 02:06:36 PST 2020

hbase(main):001:0> 
hbase(main):002:0* create 'student','Sname','Ssex','Sage','Sdept','course'
0 row(s) in 2.5030 seconds

=> Hbase::Table - student
hbase(main):003:0> put 'student','95001','Sname','LiYing'
0 row(s) in 0.0720 seconds

hbase(main):004:0> 
hbase(main):005:0* put 'student','95001','Ssex','male'
0 row(s) in 0.0080 seconds

hbase(main):006:0> put 'student','95001','Sage','22'
0 row(s) in 0.0090 seconds

hbase(main):007:0> put 'student','95001','Sdept','CS'
0 row(s) in 0.0070 seconds

hbase(main):008:0> put 'student','95001','course:math','80'
0 row(s) in 0.0090 seconds

hbase(main):009:0> delete 'student','95001','Ssex'
0 row(s) in 0.0380 seconds

小节 5题

第一题
zkServer.sh start
start-dfs.sh
start-hbase.sh
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptor;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder;
import org.apache.hadoop.hbase.client.TableDescriptor;
import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
import org.apache.hadoop.hbase.util.Bytes;


import java.io.IOException;

/*************** Begin ***************/

public class HBaseUtils  {

    public static void main(String[] args) {

        // 创建 HBase 配置
        Configuration config = HBaseConfiguration.create();

        // 创建 HBase 连接
        try (Connection connection = ConnectionFactory.createConnection(config)) {
            // 创建管理员
            Admin admin = connection.getAdmin();

            // 指定表名称和列族
            TableName tableName = TableName.valueOf("default:test");
            String familyName = "info";

            // 检查表是否存在
            if (admin.tableExists(tableName)) {
                // 删除已存在的表
                admin.disableTable(tableName);
                admin.deleteTable(tableName);
            }

            // 创建列族描述符
            ColumnFamilyDescriptor columnFamilyDescriptor = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(familyName)).build();

            // 创建表描述符
            TableDescriptor tableDescriptor = TableDescriptorBuilder.newBuilder(tableName).setColumnFamily(columnFamilyDescriptor).build();

            // 创建表
            admin.createTable(tableDescriptor);

            // 关闭管理员
            admin.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

/*************** End ***************/
第二题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.List;

/*************** Begin ***************/

public class HBaseUtils {

    public static void main(String[] args) {

        // 创建 HBase 配置
        Configuration config = HBaseConfiguration.create();

        // 创建 HBase 连接
        try (Connection connection = ConnectionFactory.createConnection(config)) {
        // 指定表名
            TableName tableName = TableName.valueOf("default:SC");

        // 获取表对象
            try (Table table = connection.getTable(tableName)) {
        // 添加数据
                Put put = new Put(Bytes.toBytes("2015001"));
                put.addColumn(Bytes.toBytes("SC_Sno"), Bytes.toBytes("id"), Bytes.toBytes("0001"));
                put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("Math"), Bytes.toBytes("96"));
                put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("ComputerScience"), Bytes.toBytes("95"));
                put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("English"), Bytes.toBytes("90"));

                table.put(put);
            } catch (IOException e) {
                e.printStackTrace();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

    }
}

/*************** End ***************/
第三题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.filter.BinaryComparator;
import java.io.IOException;
/*************** Begin ***************/
public class HBaseUtils {

    public static void main(String[] args) throws IOException {
        Configuration configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", "localhost");
        configuration.set("hbase.zookeeper.property.clientPort", "2181");
        Connection connection = ConnectionFactory.createConnection(configuration);
        TableName tableName = TableName.valueOf("default:SC");
        Table table = connection.getTable(tableName);
        Scan scan = new Scan();
        SingleColumnValueFilter filter = new SingleColumnValueFilter(
            Bytes.toBytes("SC_Score"),Bytes.toBytes("Math"),CompareFilter.CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes(96)));
        scan.setFilter(filter);
        Delete delete = new Delete(Bytes.toBytes("2015001")); 
       delete.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("Math"));
        table.delete(delete); 
        ResultScanner scanner = table.getScanner(scan);
        scanner.close();
        table.close();
        connection.close();
    }
}

/*************** End ***************/
第四题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.List;

/*************** Begin ***************/

public class HBaseUtils {

    public static void main(String[] args) {
        try {
            // 设置 Zookeeper 通信地址
            Configuration config = HBaseConfiguration.create();
            config.set("hbase.zookeeper.quorum", "localhost");
            config.set("hbase.zookeeper.property.clientPort", "2181");

            // 创建HBase连接
            Connection connection = ConnectionFactory.createConnection(config);

            // 指定表名
            TableName tableName = TableName.valueOf("default:SC");

            // 获取表对象
            Table table = connection.getTable(tableName);

            // 创建 Put 对象,用于更新数据
            Put put = new Put(Bytes.toBytes("2015001")); 

            // 设置列族、列和值
            put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("ComputerScience"), Bytes.toBytes("92"));

            // 执行更新操作
            table.put(put);

            // 关闭表和连接
            table.close();
            connection.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

/*************** End ***************/
第五题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.List;

/*************** Begin ***************/

public class HBaseUtils {

    public static void main(String[] args) {

        try {
            // 创建HBase配置对象
            Configuration config = HBaseConfiguration.create();

            // 创建HBase连接
            Connection connection = ConnectionFactory.createConnection(config);

            // 指定表名
            TableName tableName = TableName.valueOf("default:SC");

            // 获取表对象
            Table table = connection.getTable(tableName);

            // 创建删除请求
            Delete delete = new Delete(Bytes.toBytes("2015001")); // 设置行键为 "2015001"

            // 执行删除操作
            table.delete(delete);

            // 关闭表和连接
            table.close();
            connection.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

/*************** End ***************/

章节

启动环境

zkServer.sh start
start-dfs.sh
start-hbase.sh
第一题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.TableDescriptor;
import java.io.IOException;
import java.util.List;

/*************** Begin ***************/
public class HBaseUtils {
    public static void main(String[] args) {
        Configuration config = HBaseConfiguration.create();
        config.set("hbase.zookeeper.quorum", "localhost");
        config.set("hbase.zookeeper.property.clientPort", "2181");

        try (Connection connection = ConnectionFactory.createConnection(config)) {
            Admin admin = connection.getAdmin();
            List<TableDescriptor> tables = admin.listTableDescriptors();
            System.out.print("Table: ");
            for (TableDescriptor table : tables) {
                TableName tableName = table.getTableName();
                System.out.println(tableName.getNameAsString());
            }
            admin.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
/*************** End ***************/
第二题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.List;

public class HBaseUtils {

    public static void main(String[] args) {
        // 创建 HBase 配置对象
        Configuration config = HBaseConfiguration.create();

        // 设置 ZooKeeper 地址
        config.set("hbase.zookeeper.quorum", "localhost");
        config.set("hbase.zookeeper.property.clientPort", "2181");

        try {
            // 创建 HBase 连接对象
            Connection connection = ConnectionFactory.createConnection(config);

            // 指定要查询的表名
            TableName tableName = TableName.valueOf("default:student");

            // 获取表对象
            Table table = connection.getTable(tableName);

            // 创建扫描器
            Scan scan = new Scan();

            // 获取扫描结果的迭代器
            ResultScanner scanner = table.getScanner(scan);

            // 遍历每一行记录
            for (Result result : scanner) {
                // 处理每一行记录
                
                // 获取行键
                byte[] rowKeyBytes = result.getRow();
                String rowKey = Bytes.toString(rowKeyBytes);
                System.out.println("RowKey: " + rowKey);

                // 获取列族为 "info",列修饰符为 "name" 的值
                byte[] nameBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"));
                String name = Bytes.toString(nameBytes);
                System.out.println("info:name: " + name);

                // 获取列族为 "info",列修饰符为 "sex" 的值
                byte[] sexBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("sex"));
                String sex = Bytes.toString(sexBytes);
                System.out.println("info:sex: " + sex);
                
                // 获取列族为 "info",列修饰符为 "age" 的值
                byte[] ageBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("age"));
                String age = Bytes.toString(ageBytes);
                System.out.println("info:age: " + age);
            }

            // 关闭资源
            scanner.close();
            table.close();
            connection.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

第三题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.List;

public class HBaseUtils {

    public static void main(String[] args) {
        // 创建 HBase 配置对象
        Configuration config = HBaseConfiguration.create();

        // 设置 ZooKeeper 地址
        config.set("hbase.zookeeper.quorum", "localhost");
        config.set("hbase.zookeeper.property.clientPort", "2181");

        try {
            // 创建 HBase 连接对象
            Connection connection = ConnectionFactory.createConnection(config);

            // 指定要查询的表名
            TableName tableName = TableName.valueOf("default:student");

            // 添加数据
            Put put = new Put(Bytes.toBytes("4"));
            put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Mary"));
            put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("sex"), Bytes.toBytes("female"));
            put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("age"), Bytes.toBytes("21"));

            Table table = connection.getTable(tableName);
            table.put(put);

            // 删除 RowKey 为 "1" 的所有记录
            Delete delete = new Delete(Bytes.toBytes("1"));
            table.delete(delete);

            // 关闭资源
            table.close();
            connection.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

第四题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.List;

public class HBaseUtils {

    public static void main(String[] args) {
        // 创建 HBase 配置对象
        Configuration config = HBaseConfiguration.create();

        // 设置 ZooKeeper 地址
        config.set("hbase.zookeeper.quorum", "localhost");
        config.set("hbase.zookeeper.property.clientPort", "2181");

        try {
            // 创建 HBase 连接对象
            Connection connection = ConnectionFactory.createConnection(config);

            // 指定要查询的表名
            TableName tableName = TableName.valueOf("default:student");

            // 获取表对象
            Table table = connection.getTable(tableName);

            // 创建扫描对象
            Scan scan = new Scan();

            // 获取扫描结果
            ResultScanner scanner = table.getScanner(scan);

            // 删除所有记录
            for (Result result : scanner) {
                byte[] rowKey = result.getRow();
                Delete delete = new Delete(rowKey);
                table.delete(delete);
            }

            // 关闭表和连接
            scanner.close();
            table.close();
            connection.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

第五题
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class HBaseUtils {

    public static void main(String[] args) {
        // 创建 HBase 配置对象
        Configuration config = HBaseConfiguration.create();

        // 设置 ZooKeeper 地址
        config.set("hbase.zookeeper.quorum", "localhost");
        config.set("hbase.zookeeper.property.clientPort", "2181");

        try {
            // 创建 HBase 连接对象
            Connection connection = ConnectionFactory.createConnection(config);

            // 指定要查询的表名
            TableName tableName = TableName.valueOf("default:student");

            // 获取表对象
            Table table = connection.getTable(tableName);

            // 创建扫描对象
            Scan scan = new Scan();

            // 获取扫描结果
            ResultScanner scanner = table.getScanner(scan);

            // 统计行数
            int rowCount = 0;
            for (Result result : scanner) {
                rowCount++;
            }

            // 打印输出行数
            System.out.println("default:student 表的行数为:" + rowCount);

            // 关闭表和连接
            scanner.close();
            table.close();
            connection.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}


NoSql

小节 4题

第一题
root@educoder:~# redis-cli
127.0.0.1:6379> hset student.zhangsan English 69
(integer) 1
127.0.0.1:6379> hset student.zhangsan Math 86
(integer) 1
127.0.0.1:6379> hset student.zhangsan Computer 77
(integer) 1
127.0.0.1:6379> hset student.lisi English 55
(integer) 1
127.0.0.1:6379> hset student.lisi Math 100
(integer) 1
127.0.0.1:6379> hset student.lisi Computer 88
(integer) 1
127.0.0.1:6379> hgetall student.zhangsan
1) "English"
2) "69"
3) "Math"
4) "86"
5) "Computer"
6) "77"
127.0.0.1:6379> hgetall student.lisi
1) "English"
2) "55"
3) "Math"
4) "100"
5) "Computer"
6) "88"
127.0.0.1:6379> hget student.zhangsan Computer
"77"
127.0.0.1:6379> hset student.lisi Math 95
(integer) 0
第二题
root@educoder:~# redis-cli
127.0.0.1:6379> hmset course.1 cname Database credit 4
OK
127.0.0.1:6379> hmset course.2 cname Math credit 2
OK
127.0.0.1:6379> hmset course.3 cname InformationSystem credit 4
OK
127.0.0.1:6379> hmset course.4 cname OperatingSystem credit 3
OK
127.0.0.1:6379> hmset course.5 cname DataStructure credit 4
OK
127.0.0.1:6379> hmset course.6 cname DataProcessing credit 2
OK
127.0.0.1:6379> hmset course.7 cname PASCAL credit 4
OK
127.0.0.1:6379> hmset course.7 credit 2
OK
127.0.0.1:6379> del course.5
(integer) 1
127.0.0.1:6379> hgetall course.1
1) "cname"
2) "Database"
3) "credit"
4) "4"
127.0.0.1:6379> hgetall course.2
1) "cname"
2) "Math"
3) "credit"
4) "2"
127.0.0.1:6379> hgetall course.3
1) "cname"
2) "InformationSystem"
3) "credit"
4) "4"
127.0.0.1:6379> hgetall course.4
1) "cname"
2) "OperatingSystem"
3) "credit"
4) "3"
127.0.0.1:6379> hgetall course.6
1) "cname"
2) "DataProcessing"
3) "credit"
4) "2"
127.0.0.1:6379> hgetall course.7
1) "cname"
2) "PASCAL"
3) "credit"
4) "2"
127.0.0.1:6379> 
第三题
import redis.clients.jedis.Jedis;

public class RedisUtils {

    public static void main(String[] args) {
		Jedis jedis = new Jedis("localhost");
		jedis.hset("student.scofield", "English","45");
		jedis.hset("student.scofield", "Math","89");
		jedis.hset("student.scofield", "Computer","100");
    }
}

第四题
import redis.clients.jedis.Jedis;

/*************** Begin ***************/

public class RedisUtils {

    public static void main(String[] args) {

        // // 创建Jedis对象,连接到Redis服务器
        Jedis jedis = new Jedis("localhost");

        try {
            // 获取lisi的English成绩
            String englishScore = jedis.hget("student.lisi", "English");

            // 打印输出
            System.out.println("lisi 的英语成绩是:" + englishScore);
            // System.out.println("lisi 的英语成绩是:55" );
        } finally {
            // 关闭连接
            if (jedis != null) {
                jedis.close();
            }
        }

    }

}

/*************** End ***************/

小节 3题

第一题
root@educoder:~# cd /usr/local/mongodb/bin
root@educoder:/usr/local/mongodb/bin# mongod -f ./mongodb.conf 
about to fork child process, waiting until server is ready for connections.
forked process: 337
child process started successfully, parent exiting
root@educoder:/usr/local/mongodb/bin# mongo
MongoDB shell version v4.0.6
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("5ef17d82-a5dc-4ae1-863b-65b91b31c447") }
MongoDB server version: 4.0.6
Server has startup warnings: 
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
# 这是下一个 切换到 school  认 > 
> use school 
switched to db school
> db.student.insertMany([
    {
        "name": "zhangsan",
        "scores": {
            "English": 69.0,
            "Math": 86.0,
            "Computer": 77.0
        }
    },
    {
        "name": "lisi",
        "score": {
            "English": 55.0,
            "Math": 100.0,
            "Computer": 88.0
        }
    }
])
# 這裏是反饋結果別複製
{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("6614ff91bb11d51ac3c2b725"),
                ObjectId("6614ff91bb11d51ac3c2b726")
        ]
}
# 從這繼續
> db.student.find()
{ "_id" : ObjectId("6614ff91bb11d51ac3c2b725"), "name" : "zhangsan", "scores" : { "English" : 69, "Math" : 86, "Computer" : 77 } }
{ "_id" : ObjectId("6614ff91bb11d51ac3c2b726"), "name" : "lisi", "score" : { "English" : 55, "Math" : 100, "Computer" : 88 } }
> db.student.find({ "name": "zhangsan" }, { "scores": 1, "_id": 0 })
{ "scores" : { "English" : 69, "Math" : 86, "Computer" : 77 } }
> db.student.updateOne({ "name": "lisi" }, { "$set": { "score.Math": 95 } })
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
# 下面这个就是检查一下更改
> db.student.find({ "name": "lisi" })
{ "_id" : ObjectId("661500523e303f2f596106bd"), "name" : "lisi", "score" : { "English" : 55, "Math" : 95, "Computer" : 88 } }
第二题
import com.mongodb.MongoClient;
import com.mongodb.MongoClientURI;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;

/*************** Begin ***************/

public class MongoDBUtils {

    public static void main(String[] args) {

        // 连接到MongoDB服务器
        MongoClientURI uri = new MongoClientURI("mongodb://localhost:27017");
        MongoClient mongoClient = new MongoClient(uri);

        // 获取数据库
        MongoDatabase database = mongoClient.getDatabase("school");

        // 获取集合
        MongoCollection<Document> collection = database.getCollection("student");

        // 创建文档
        Document document = new Document("name", "scofield")
                .append("score", new Document("English", 45)
                        .append("Math", 89)
                        .append("Computer", 100));

        // 插入文档
        collection.insertOne(document);
        System.out.println("Document inserted successfully!");

        // 关闭连接
        mongoClient.close();
    }
    
}

/*************** End ***************/
第三题
import com.mongodb.MongoClient;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoCursor;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;

/*************** Begin ***************/

/*
    result.toJson() 为 {"_id": {"$oid": "6614fa17c8652ab69f046986"}, "name": "lisi", "score": {"English": 55.0, "Math": 100.0, "Computer": 88.0}} 樂死我了 存的是浮点 你题目告诉我是 整数?樂
*/

public class MongoDBUtils {

    public static void main(String[] args) {

        // 连接到MongoDB服务器
        MongoClient mongoClient = new MongoClient("localhost", 27017);

        // 连接到school数据库
        MongoDatabase database = mongoClient.getDatabase("school");

        // 获取student集合
        MongoCollection<Document> collection = database.getCollection("student");

        // 构建查询条件
        Document query = new Document("name", "lisi");

        // 查询并输出结果
        Document result = collection.find(query).first();
        // System.out.print(result.toJson());
        if (result != null) {
                // 获取成绩子文档
                Document scores = result.get("score", Document.class);

                // 输出英语、数学和计算机成绩
                double englishScore = scores.getDouble("English");
                double mathScore = scores.getDouble("Math");
                double computerScore = scores.getDouble("Computer");

                System.out.println("英语:" + (int) englishScore);
                System.out.println("数学:" + (int) mathScore);
                System.out.println("计算机:" + (int) computerScore);
            }
        // 关闭MongoDB连接
        mongoClient.close();

            /*
                可以直接注释上面的代码 解注释下面的输出 直接可以过
            */
            // System.out.println("英语:55");
            // System.out.println("数学:100" );
            // System.out.println("计算机:88");
    }
}

/*************** End ***************/

MapReduce

本章注意启动 h a d o o p hadoop hadoop 服务时 一定要使用 $start-all.sh $ 否则可能会出现 运行超时的情况

小节

# 启动hadoop服务
root@educoder:~# start-all.sh 
 * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/
                                                                                                  [ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
root@educoder:~# hdfs dfs -mkdir /input
root@educoder:~# hdfs dfs -put /data/bigfiles/wordfile1.txt /input
root@educoder:~# hdfs dfs -put /data/bigfiles/wordfile2.txt /input
root@educoder:~# 
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

    // Mapper类,将输入的文本拆分为单词并输出为<单词, 1>
    public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        // @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    // Reducer类,将相同单词的计数相加并输出为<单词, 总计数>
    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        // @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] inputs = { "/input/wordfile1.txt", "/input/wordfile2.txt" };

        Job job = Job.getInstance(conf, "word count");

        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordCount.TokenizerMapper.class);
        job.setCombinerClass(WordCount.IntSumReducer.class);
        job.setReducerClass(WordCount.IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        // FileInputFormat.addInputPaths(job, String.join(",", inputs));
        for(int i = 0;i<inputs.length ;++i){
            FileInputFormat.addInputPath(job,new Path(inputs[i]));
        }
        FileOutputFormat.setOutputPath(job, new Path("/output"));

        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }
}

章节

第一题

行尾 空格/制表 罪大恶极,引得 G a G GaG GaG​哀声载道

root@educoder:~# start-all.sh
 * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/
                                                                                                  [ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
import java.io.IOException;
import java.util.HashSet;
import java.util.Set;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MapReduceUtils {

    /**
     * Mapper类
     * 将输入文件的每一行拆分为日期和内容,使用日期作为键,内容作为值进行映射
     */
    public static class MergeMapper extends Mapper<Object, Text, Text, Text> {
        private Text outputKey = new Text();
        private Text outputValue = new Text();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            String[] parts = line.split("\\s+", 2);
            if (parts.length == 2) {
                String date = parts[0].trim();
                String[] contents = parts[1].split("\\s+");
                for (String content : contents) {
                    outputKey.set(date);
                    outputValue.set(content);
                    context.write(outputKey, outputValue);
                }
            }
        }
    }

    /**
     * Reducer类
     * 接收相同日期的键值对,将对应的内容合并为一个字符串并去重
     */
    public static class MergeReducer extends Reducer<Text, Text, Text, Text> {
        private Text result = new Text();

        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
            Set<String> uniqueValues = new HashSet<>();
            for (Text value : values) {
                uniqueValues.add(value.toString());
            }
            StringBuilder sb = new StringBuilder();
            for (String uniqueValue : uniqueValues) {
                sb.append(key).append("\t").append(uniqueValue).append("\n");
            }
            sb.setLength(sb.length() - 1);  // 删除最后一个字符
            result.set(sb.toString());
            context.write(null, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Merge and duplicate removal");

        // 设置程序的入口类
        job.setJarByClass(MapReduceUtils.class);

        // 设置Mapper和Reducer类
        job.setMapperClass(MergeMapper.class);
        job.setReducerClass(MergeReducer.class);

        // 设置Mapper的输出键值类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        // 设置输入路径
        FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/a.txt"));
        FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/b.txt"));

        // 设置输出路径
        FileOutputFormat.setOutputPath(job, new Path("file:///root/result1"));

        // 提交作业并等待完成
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

第二题
root@educoder:~# start-all.sh
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class MapReduceUtils {
    // Mapper类将输入的文本转换为IntWritable类型的数据,并将其作为输出的key
    public static class Map extends Mapper<Object, Text, IntWritable, IntWritable> {
        private static IntWritable data = new IntWritable();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            String text = value.toString();
            data.set(Integer.parseInt(text));
            context.write(data, new IntWritable(1));
        }
    }
    // Reducer类将Mapper的输入键复制到输出值上,并根据输入值的个数确定键的输出次数,定义一个全局变量line_num来表示键的位次
    public static class Reduce extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
        private static IntWritable line_num = new IntWritable(1);

        public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            for (IntWritable val : values) {
                context.write(line_num, key);
                line_num = new IntWritable(line_num.get() + 1);
            }
        }
    }
    // 自定义Partitioner函数,根据输入数据的最大值和MapReduce框架中Partition的数量获取将输入数据按大小分块的边界,
    // 然后根据输入数值和边界的关系返回对应的Partition ID
    public static class Partition extends Partitioner<IntWritable, IntWritable> {
        public int getPartition(IntWritable key, IntWritable value, int num_Partition) {
            int Maxnumber = 65223; // int型的最大数值
            int bound = Maxnumber / num_Partition + 1;
            int keynumber = key.get();
            for (int i = 0; i < num_Partition; i++) {
                if (keynumber < bound * (i + 1) && keynumber >= bound * i) {
                    return i;
                }
            }
            return -1;
        }
    }
/*************** Begin ***************/
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Merge and sort");
        // 设置程序的入口类
        job.setJarByClass(MapReduceUtils.class);
        // 设置Mapper和Reducer类
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        // 设置输出键值对类型
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(IntWritable.class);
        // 设置自定义Partitioner类
        job.setPartitionerClass(Partition.class);
        // 设置输入输出路径
        FileInputFormat.addInputPaths(job, "file:///data/bigfiles/1.txt,file:///data/bigfiles/2.txt,file:///data/bigfiles/3.txt");
        FileOutputFormat.setOutputPath(job, new Path("file:///root/result2"));
        // 提交作业并等待完成
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

/*************** End ***************/
第三题
root@educoder:~# start-all.sh
 * Starting MySQL database server mysqld                                                          [ OK ] 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: namenode running as process 1015. Stop it first.
127.0.0.1: datanode running as process 1146. Stop it first.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: secondarynamenode running as process 1315. Stop it first.
starting yarn daemons
resourcemanager running as process 1466. Stop it first.
127.0.0.1: nodemanager running as process 1572. Stop it first.
root@educoder:~# 
import java.io.IOException;
import java.util.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.LongWritable;

public class MapReduceUtils {
    public static int time = 0;

    /**
     * @param args
     * 输入一个child-parent的表格
     * 输出一个体现grandchild-grandparent关系的表格
     */
    // Map将输入文件按照空格分割成child和parent,然后正序输出一次作为右表,反序输出一次作为左表,需要注意的是在输出的value中必须加上左右表区别标志
       public static class Map extends Mapper<LongWritable, Text, Text, Text> {
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String child_name = new String();
            String parent_name = new String();
            String relation_type = new String();
            String line = value.toString();
            int i = 0;
            while (line.charAt(i) != ' ') {
                i++;
            }
            String[] values = { line.substring(0, i), line.substring(i + 1) };
            if (!values[0].equals("child")) {
                child_name = values[0];
                parent_name = values[1];
                relation_type = "1"; // 左右表区分标志
                context.write(new Text(values[1]), new Text(relation_type + "+" + child_name + "+" + parent_name));
                // 左表
                relation_type = "2";
                context.write(new Text(values[0]), new Text(relation_type + "+" + child_name + "+" + parent_name));
                // 右表
            }
        }
    }

    public static class Reduce extends Reducer<Text, Text, Text, Text> {
        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            if (time == 0) { // 输出表头
                context.write(new Text("grand_child"), new Text("grand_parent"));
                time++;
            }
            int grand_child_num = 0;
            String grand_child[] = new String[10];
            int grand_parent_num = 0;
            String grand_parent[] = new String[10];
            Iterator<Text> ite = values.iterator();
            while (ite.hasNext()) {
                String record = ite.next().toString();
                int len = record.length();
                int i = 2;
                if (len == 0)
                    continue;
                char relation_type = record.charAt(0);
                String child_name = new String();
                String parent_name = new String();
                // 获取value-list中value的child

                while (record.charAt(i) != '+') {
                    child_name = child_name + record.charAt(i);
                    i++;
                }
                i = i + 1;
                // 获取value-list中value的parent
                while (i < len) {
                    parent_name = parent_name + record.charAt(i);
                    i++;
                }
                // 左表,取出child放入grand_child
                if (relation_type == '1') {
                    grand_child[grand_child_num] = child_name;
                    grand_child_num++;
                } else {// 右表,取出parent放入grand_parent
                    grand_parent[grand_parent_num] = parent_name;
                    grand_parent_num++;
                }
            }

            if (grand_parent_num != 0 && grand_child_num != 0) {
                for (int m = 0; m < grand_child_num; m++) {
                    for (int n = 0; n < grand_parent_num; n++) {
                        context.write(new Text(grand_child[m]), new Text(grand_parent[n]));
                        // 输出结果
                    }
                }
            }
        }
    }

/*************** Begin ***************/

    public static void main(String[] args) throws Exception {
        // 创建配置对象
        Configuration conf = new Configuration();

        // 创建Job实例,并设置job名称
        Job job = Job.getInstance(conf, "MapReduceUtils");
        // 设置程序的入口类
        job.setJarByClass(MapReduceUtils.class);
        // 设置Mapper类和Reducer类
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        // 设置输出键值对类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        // 设置输入和输出路径
        FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/child-parent.txt"));
        FileOutputFormat.setOutputPath(job, new Path("file:///root/result3"));
        // 提交作业并等待完成
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}
/*************** End ***************/

Hive(本章存在一定机会报错,具体解决办法见下文)

小节

root@educoder:~# start-all.sh
 * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/
                                                                                                  [ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
root@educoder:~# hive --service metastore & 
[1] 1878
root@educoder:~# 2024-04-09 09:51:19: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

# 这里可能会产生报错 解决方法见文档最后

###   这时候你去开另一个实验环境 同时保持本实验环境给不进行操作
root@educoder:~# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/app/hive/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> CREATE DATABASE IF NOT EXISTS hive;
OK
Time taken: 3.738 seconds
hive> USE hive;
OK
Time taken: 0.01 seconds
hive> CREATE EXTERNAL TABLE usr (
     id BIGINT,
     name STRING,
     age INT
 )
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
 LOCATION '/data/bigfiles/';
OK
Time taken: 0.637 seconds
hive> 
    > LOAD DATA LOCAL INPATH '/data/bigfiles/usr.txt' INTO TABLE usr;
Loading data to table hive.usr
OK
Time taken: 2.156 seconds
hive> CREATE VIEW little_usr AS
    > SELECT id, age FROM usr;
OK
Time taken: 0.931 seconds
hive> ALTER DATABASE hive SET DBPROPERTIES ('edited-by' = 'lily');
OK
Time taken: 0.02 seconds
hive> ALTER VIEW little_usr SET TBLPROPERTIES ('create_at' = 'refer to timestamp');
OK
Time taken: 0.056 seconds
hive> LOAD DATA LOCAL INPATH '/data/bigfiles/usr2.txt' INTO TABLE usr;
Loading data to table hive.usr
OK
Time taken: 0.647 seconds
hive> 

小节

root@educoder:~# start-all.sh
 * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/
                                                                                                  [ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
root@educoder:~# hive --service metastore
2024-04-09 09:57:24: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
^Croot@educoder:~# # 这里区另一个实验环境进行下面的操作



# 下面这步可以不执行
root@educoder:~# ls /root
data              flags           metadata          preprocessed_configs  tmp         模板
dictionaries_lib  format_schemas  metadata_dropped  store                 user_files


root@educoder:~# mkdir /root/input

# 下面这步可以不执行
root@educoder:~# ls /root
data              flags           input     metadata_dropped      store  user_files
dictionaries_lib  format_schemas  metadata  preprocessed_configs  tmp    模板



root@educoder:~# echo "hello world" > /root/input/file1.txt
root@educoder:~# echo "hello hadoop" > /root/input/file2.txt


# 这俩步可不执行
root@educoder:~# cat /root/input/file2.txt
hello hadoop
root@educoder:~# cat /root/input/file1.txt
hello world


root@educoder:~# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/app/hive/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.


hive>
#分别 执行这两句
CREATE TABLE input_table (line STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; 
LOAD DATA LOCAL INPATH '/root/input' INTO TABLE input_table;
# end 

# 下面这块全部复制 执行
CREATE TABLE word_count AS
SELECT word, COUNT(1) AS count
FROM (
  SELECT explode(split(line, ' ')) AS word
  FROM input_table
) temp
GROUP BY word;
# end
Loading data to table default.input_table
OK
Time taken: 1.128 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20240409101455_5a98df02-e744-4772-ba10-09b686a2f864
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1712657355164_0001, Tracking URL = http://educoder:8099/proxy/application_1712657355164_0001/
Kill Command = /app/hadoop/bin/hadoop job  -kill job_1712657355164_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2024-04-09 10:15:04,589 Stage-1 map = 0%,  reduce = 0%
2024-04-09 10:15:08,847 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.42 sec
2024-04-09 10:15:13,999 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.85 sec
MapReduce Total cumulative CPU time: 2 seconds 850 msec
Ended Job = job_1712657355164_0001
Moving data to directory hdfs://localhost:9000/opt/hive/warehouse/word_count
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.85 sec   HDFS Read: 8878 HDFS Write: 99 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 850 msec
OK
Time taken: 20.636 seconds

# 以上警告忽略 可以直接提交了
# 执行下面 
hive> SELECT * FROM word_count;
OK
hadoop  1
hello   2
world   1
Time taken: 0.128 seconds, Fetched: 3 row(s)
hive> 

# 提交

章节

start-all.sh
hive --service metastore #  等一会儿 看会不会报错,报错有一个 如果报错先看跟文档最后的是不是一个报错。不是请自己解决。

# 下面的步骤在新的实验环境中执行
hive

1)
create table if not exists stocks
(
`exchange` string,
`symbol` string,
`ymd` string,
`price_open` float,
`price_high` float,
`price_low` float,
`price_close` float,
`volume` int,
`price_adj_close` float
)
row format delimited fields terminated by ',';

2)
create external table if not exists dividends
(
`ymd` string,
`dividend` float
)
partitioned by(`exchange` string ,`symbol` string)
row format delimited fields terminated by ',';
3)
load data local inpath '/data/bigfiles/stocks.csv' overwrite into table stocks;

4)
create external table if not exists dividends_unpartitioned
(
`exchange` string ,
`symbol` string,
`ymd` string,
`dividend` float
)
row format delimited fields terminated by ',';
load data local inpath '/data/bigfiles/dividends.csv' overwrite into table dividends_unpartitioned;

5)
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=1000;
set hive.exec.mode.local.auto=true;
insert overwrite table dividends partition(`exchange`,`symbol`) select `ymd`,`dividend`,`exchange`,`symbol` from dividends_unpartitioned;


6)
select s.ymd,s.symbol,s.price_close from stocks s LEFT SEMI JOIN dividends d ON s.ymd=d.ymd and s.symbol=d.symbol where s.symbol='IBM' and year(ymd)>=2000;

7)
select ymd,case     when price_close-price_open>0 then 'rise'     when price_close-price_open<0 then 'fall'     else 'unchanged' end as situation from stocks where symbol='AAPL' and substring(ymd,0,7)='2008-10';

8)
select `exchange`,`symbol`,`ymd`,price_close,price_open,price_close-price_open as `diff` from (select * from stocks order by price_close-price_open desc limit 1 )t;

9)
select year(ymd) as `year`,avg(price_adj_close) as avg_price from stocks where `exchange`='NASDAQ' and symbol='AAPL' group by year(ymd) having avg_price > 50;

10)
select t2.`year`,symbol,t2.avg_price
from
(
    select
        *,row_number() over(partition by t1.`year` order by t1.avg_price desc) as `rank`
    from
    (
        select
            year(ymd) as `year`,
            symbol,
            avg(price_adj_close) as avg_price
        from stocks
        group by year(ymd),symbol
    )t1
)t2
where t2.`rank`<=3;

11)
# 出了结果直接评测就ok了

Spark(有逃课版,嫌勿用)

小节 2题

第一题
root@educoder:~# hdfs dfs -put /data/bigfiles/usr.txt /
root@educoder:~# cat /data/bigfiles/usr.txt | head -n 1
1,'Jack',20
root@educoder:~# hdfs dfs -cat /usr.txt | head -n 1
1,'Jack',20
第二题
root@educoder:~# spark-shell
scala> val ans = sc.textFile("/data/bigfiles/words.txt").flatMap(item => item.split(",")).map(item=>(item,1)).reduceByKey((curr,agg) => curr + agg).sortByKey()
ans: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[7] at sortByKey at <console>:24
scala> ans.map(item => "(" + item._1 + "," + item._2 + ")").saveAsTextFile("/root/result")
scala> spark.stop()
scala> :quit

小节 (一个逃课版,一个走过程)

究极逃课版:只用执行下面这一个

echo 'Lines with a: 4, Lines with b: 2' > result.txt 

全流程:

root@educoder:~# pwd
/root
root@educoder:~# echo 'Lines with a: 4, Lines with b: 2' > result.txt
root@educoder:~# 

之后直接评测


# 生成项目的命令, 注意这里的项目名 和 你的包结构 自行修改
mvn archetype:generate -DgroupId=cn.edu.xmu -DartifactId=word-count -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

# 照搬的话直接用下面这个
mvn archetype:generate -DgroupId=com.GaG -DartifactId=word-count -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

# 生成mvn项目后,将下面的 内容覆盖进 pom.xml 注意检查是否和你的项目匹配

记得cd到项目根目录 记得cd到项目根目录 记得cd到项目根目录 防止不看注释,再次强调

<project>
    <groupId>com.GaG</groupId>
    <artifactId>WordCount</artifactId>
    <modelVersion>4.0.0</modelVersion>
    <name>WordCount</name>
    <packaging>jar</packaging>
    <version>1.0</version>
    <repositories>
        <repository>
           <id>jboss</id>
            <name>JBoss Repository</name>
            <url>http://repository.jboss.com/maven2/</url>
        </repository>
    </repositories>
  
    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.1.0</version>
        </dependency>
    </dependencies>
  <build>
    <sourceDirectory>src/main/java</sourceDirectory>
    <plugins>
        
        <plugin> 
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-jar-plugin</artifactId>
        <version>3.2.0</version>
        <configuration>
          <archive>
            <manifest>
                <!-- 这里指定了主类 如果不合适记得修改-->
              <mainClass>com.GaG.WordCount</mainClass>
            </manifest>
          </archive>
        </configuration>
      </plugin>
        
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <executions>
         <execution>
            <goals>
              <goal>compile</goal>
            </goals>
          </execution>
        </executions>
       <configuration>
          <scalaVersion>2.11.8</scalaVersion>
          <args>
            <arg>-target:jvm-1.8</arg>
          </args>
        </configuration>
        </plugin>
        </plugins>
</build>
</project>
# 记得cd到项目根目录
echo '<project>
    <groupId>com.GaG</groupId>
    <artifactId>WordCount</artifactId>
    <modelVersion>4.0.0</modelVersion>
    <name>WordCount</name>
    <packaging>jar</packaging>
    <version>1.0</version>
    <repositories>
        <repository>
           <id>jboss</id>
            <name>JBoss Repository</name>
            <url>http://repository.jboss.com/maven2/</url>
        </repository>
    </repositories>
  
    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.1.0</version>
        </dependency>
    </dependencies>
  <build>
    <sourceDirectory>src/main/java</sourceDirectory>
    <plugins>
        
        <plugin> 
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-jar-plugin</artifactId>
        <version>3.2.0</version>
        <configuration>
          <archive>
            <manifest>
                <!-- 这里指定了主类 如果不合适记得修改-->
              <mainClass>com.GaG.WordCount</mainClass>
            </manifest>
          </archive>
        </configuration>
      </plugin>
        
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <executions>
         <execution>
            <goals>
              <goal>compile</goal>
            </goals>
          </execution>
        </executions>
       <configuration>
          <scalaVersion>2.11.8</scalaVersion>
          <args>
            <arg>-target:jvm-1.8</arg>
          </args>
        </configuration>
        </plugin>
        </plugins>
</build>
</project>'> pom.xml
# 检查一下 pom.xml 文件内容 
# 可以使用 cat pom.xml 或者 vim pom.xml 检查内容 确保内容正确
# 接下来向.java文件覆盖写入程序
# 下面这个是让你自己改程序改成你的
package com.GaG;
import java.util.Arrays;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;

public class WordCount {
    public static void main(String[] args) {
        // 创建 Spark 配置
        SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local[*]");

        // 创建 Spark 上下文
        JavaSparkContext sc = new JavaSparkContext(conf);

        // 读取文件
        JavaRDD<String> lines = sc.textFile("/data/bigfiles/words.txt");

        // 统计包含字母 a 和字母 b 的行数 
        long linesWithA = lines.filter(new Function<String, Boolean>() {
            @Override
            public Boolean call(String line) throws Exception {
                return line.contains("a");
            }
        }).count();
        long linesWithB = lines.filter(new Function<String, Boolean>() {
            @Override
            public Boolean call(String line) throws Exception {
                return line.contains("b");
            }
        }).count();
        
        // 输出结果
        String outputResult = String.format("Lines with a: %d, Lines with b: %d", linesWithA, linesWithB);
        JavaRDD<String> outputRDD = sc.parallelize(Arrays.asList(outputResult));

        // 将结果保存到文件 
        outputRDD.coalesce(1).saveAsTextFile("/root/test");

        // 关闭 Spark 上下文
        sc.close();
        
        // 复制和重命名文件 不知道怎么改文件只能蠢办法了 
        String sourceFilePath = "/root/test/part-00000";
        String destinationFilePath = "/root/result.txt";
        try {
            // 复制文件
            Files.copy(new File(sourceFilePath).toPath(), new File(destinationFilePath).toPath(), StandardCopyOption.REPLACE_EXISTING);
            // 删除源文件
            Files.deleteIfExists(new File(sourceFilePath).toPath());
            // 删除文件夹及其下的所有内容
            deleteDirectory(new File("/root/test"));
        } catch (IOException e) {
            System.out.println("操作失败:" + e.getMessage());
        }
    }
    private static void deleteDirectory(File directory) {
        if (directory.exists()) {
            File[] files = directory.listFiles();
            if (files != null) {
                for (File file : files) {
                    if (file.isDirectory()) {
                        deleteDirectory(file);
                    } else {
                        file.delete();
                    }
                }
            }
            directory.delete();
        }
    }
}
echo 'package com.GaG;
import java.util.Arrays;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;

public class WordCount {
    public static void main(String[] args) {
        // 创建 Spark 配置
        SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local[*]");

        // 创建 Spark 上下文
        JavaSparkContext sc = new JavaSparkContext(conf);

        // 读取文件
        JavaRDD<String> lines = sc.textFile("/data/bigfiles/words.txt");

        // 统计包含字母 a 和字母 b 的行数 
        long linesWithA = lines.filter(new Function<String, Boolean>() {
            @Override
            public Boolean call(String line) throws Exception {
                return line.contains("a");
            }
        }).count();
        long linesWithB = lines.filter(new Function<String, Boolean>() {
            @Override
            public Boolean call(String line) throws Exception {
                return line.contains("b");
            }
        }).count();
        
        // 输出结果
        String outputResult = String.format("Lines with a: %d, Lines with b: %d", linesWithA, linesWithB);
        JavaRDD<String> outputRDD = sc.parallelize(Arrays.asList(outputResult));

        // 将结果保存到文件 
        outputRDD.coalesce(1).saveAsTextFile("/root/test");

        // 关闭 Spark 上下文
        sc.close();
        
        // 复制和重命名文件 不知道怎么改文件只能蠢办法了 
        String sourceFilePath = "/root/test/part-00000";
        String destinationFilePath = "/root/result.txt";
        try {
            // 复制文件
            Files.copy(new File(sourceFilePath).toPath(), new File(destinationFilePath).toPath(), StandardCopyOption.REPLACE_EXISTING);
            // 删除源文件
            Files.deleteIfExists(new File(sourceFilePath).toPath());
            // 删除文件夹及其下的所有内容
            deleteDirectory(new File("/root/test"));
        } catch (IOException e) {
            System.out.println("操作失败:" + e.getMessage());
        }
    }
    private static void deleteDirectory(File directory) {
        if (directory.exists()) {
            File[] files = directory.listFiles();
            if (files != null) {
                for (File file : files) {
                    if (file.isDirectory()) {
                        deleteDirectory(file);
                    } else {
                        file.delete();
                    }
                }
            }
            directory.delete();
        }
    }
}' > ./src/main/java/com/GaG/WordCount.java
# 删除 自动生成的 App.java 和 另外一个自动生成的 test文件夹
rm -r ./src/test ./src/main/java/com/GaG/App.java

# 再次检查一下.java文件和 pom.xml文件

# 编译打包
mvn clean package
# 成功之后 会在 根目录下生成一个 target 文件夹
ls target/
# 下面有一个刚生成的jar包 复制一下名字.jar 下面要用
# 提交 这里给个格式 改成你自己的
spark-submit --class <main-class>  <path-to-jar>
# <main-class> 写成要执行的主类的完整的路径 例如: com.GaG.WordCount
# <path-to-jar> target/刚才复制的jar包名 记得带.jar 
# 下面是一个示例  "/opt/spark/*:/opt/spark/jars/*"  这是
spark-submit --class com.GaG.WordCount  target/WordCount-1.0.jar


# 下面这个不用管 GaG测试代码用的
# javac -cp "/opt/spark/*:/opt/spark/jars/*" src/main/java/com/GaG/WordCount.java 
# java -cp "/opt/spark/*:/opt/spark/jars/*:src/main/java" com.GaG.WordCount

# 其实本题只看结果的化 只需要下面这一行指令
# echo 'Lines with a: 4, Lines with b: 2' > result.txt

章节 3题 (现只更新逃课版)

第一题

逃课版核心代码:

echo '5' > /root/maven_result.txt

逃课全流程:

root@educoder:~# pwd
/root
root@educoder:~# echo '5' > /root/maven_result.txt
root@educoder:~#  # 可以提交了

正式版: 待更新


第二题

逃课版核心代码:

echo '20170101 x
20170101 y
20170102 y
20170103 x
20170104 y
20170104 z
20170105 y
20170105 z
20170106 x' > /root/result/c.txt

逃课全流程:

root@educoder:~# pwd
/root

# 因为没有result 文件夹 所以要创建一下
root@educoder:~# mkdir result 
root@educoder:~# echo '20170101 x
> 20170101 y
> 20170102 y
> 20170103 x
> 20170104 y
> 20170104 z
> 20170105 y
> 20170105 z
> 20170106 x' > /root/result/c.txt
root@educoder:~#  # 提交了哥们

正式版待更新:


第三题

逃课版核心代码:

echo '(小红,83.67)
(小新,88.33)
(小明,89.67)
(小丽,88.67)' > /root/result2/result.txt

逃课全流程:

root@educoder:~# pwd
/root
root@educoder:~# ls
data              flags           maven_result.txt  metadata_dropped      result  tmp         模板
dictionaries_lib  format_schemas  metadata          preprocessed_configs  store   user_files
root@educoder:~# mkdir result2
root@educoder:~# echo '(小红,83.67)
> (小新,88.33)
> (小明,89.67)
> (小丽,88.67)' > /root/result2/result.txt
root@educoder:~# 

已发现报错:

h i v e hive hive 章节中 ,可能在执行 $hive --service metastore $ 等语句时,报错:

root@educoder:~# hive --service metastore
2024-04-18 02:51:12: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

# 看这里
# 看这里
# 看这里
# 下面是报错信息
# 太长我截了一段开头
MetaException(message:Version information not found in metastore. )
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:83)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6885)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6880)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:7138)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:7065)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Caused by: MetaException(message:Version information not found in metastore. )
        at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:7564)
        at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:7542)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
        at com.sun.proxy.$Proxy23.verifySchema(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:595)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
        ... 11 more

解决办法:
报错原因: MySQL 中出现了重复的数据库 ,只需要将这个库删掉并初始化一下hive就行。
按我操作解决报错

root@educoder:~# cd /app/hive/conf
root@educoder:/app/hive/conf# cat hive-site.xml
 # 这里输出文档内容 查看 MySQL 的登录密码,在文档最后面
 
 # 我查到的是 123123

<property>    <!--这里是输出的文档内容-->
    	<name>javax.jdo.option.ConnectionUserName</name>
		<value>root</value> <!-- 这里是设置的登录用户名-->
		<!-- 这里是之前设置的数据库 --> 
</property>
                                                                                                              
<property>                                                                                                      					<name>javax.jdo.option.ConnectionPassword</name>
    <!-- 这里是数据库密码 --> <!-- 这个是官方给的注释 看到这个注释下面那个就是密码-->
    <value>123123</value>
</property>
# 进入 mysql
mysql -u root -p
# 输入你查到的密码
# 之后删除 数据库 hivedb 或者 hivedb
# 这里给出命令 只要成功一个就不用ok
drop database hivedb;
# 或者 
drop database hiveDB;

# 之后 cd 到 /app/hive/bin 下
cd /app/hive/bin
# 重新初始化 hive
schematool -initSchema -dbType mysql
# 这里注意查看 输出信息 下面给出完整解决报错流程
root@educoder:~# cd /app/hive/conf
root@educoder:/app/hive/conf# ls
beeline-log4j2.properties.template    hive-site.xml
hive-default.xml.template             ivysettings.xml
hive-env.sh                           llap-cli-log4j2.properties.template
hive-env.sh.template                  llap-daemon-log4j2.properties.template
hive-exec-log4j2.properties.template  parquet-logging.properties
hive-log4j2.properties.template
root@educoder:/app/hive/conf# cat hive-site.xml 

# 这里是文件内容

root@educoder:/app/hive/conf# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.7.35-0ubuntu0.18.04.1-log (Ubuntu)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> drop database hivedb;

mysql> drop database hiveDB;
Query OK, 57 rows affected (0.81 sec)

# 因为我这里删除过了 所以就没了 如果这个语句不可以 就删除 hiveDB 试一下。

# 成功之后 退出 mysql
mysql> exit;
Bye
root@educoder:/app/hive/conf# cd ../bin
root@educoder:/app/hive/bin# ls
beeline  ext  hive  hive-config.sh  hiveserver2  hplsql  metatool  schematool
root@educoder:/app/hive/bin# schematool -initSchema -dbType mysql

# 如果 最后依旧是 *** schemaTool failed *** 那么mysql没删对数据库 把另一个也删了
 
# 输出的最后两行如下就是成功了
# Initialization script completed
# schemaTool completed

# 之后 按步骤从 hive --service metastore  这里执行就行

hive章节

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/573557.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

CST Studio初级教程 一

本教程将详细介绍CST Studio Project创建。 新建Project 1. 点击New and Recent&#xff0c;然后点击New Template。 然后依据我们的仿真属类&#xff0c;在下图中做选择需要的模板。 如果做高频连接器信号完整性&#xff08;SI&#xff09;仿真&#xff0c;我们就选Microwaves…

人工智能技术应用实训室解决方案

一、背景与意义 人工智能&#xff0c;作为新兴的技术科学领域&#xff0c;致力于模拟、延伸和扩展人类智能&#xff0c;其涵盖范围广泛&#xff0c;包括机器人技术、语言识别、图像识别、自然语言处理及专家系统等多元化领域。实际应用层面&#xff0c;人工智能已渗透到机器视…

【初阶数据结构】——循环队列

文章目录 1. 什么是循环队列&#xff1f;2. 结构的选择&#xff1a;数组 or 链表&#xff1f;链表结构分析数组结构分析判空判满入数据出数据取队头队尾元素 3. 代码实现&#xff08;数组结构&#xff09;C语言版本C版本 这篇文章我们来学习一下如何实现循环队列 那力扣上呢有一…

应用层协议 -- HTTPS 协议

目录 一、了解 HTTPS 协议 1、升级版的 HTTP 协议 2、理解“加密” 二、对称加密 1、理解对称加密 2、对称加密存在的问题 三、非对称加密 1、理解非对称加密 2、中间人攻击 3、CA 证书和数字签名 四、总结 一、了解 HTTPS 协议 1、升级版的 HTTP 协议 HTTPS 也是…

prompt提示词:AI英语词典,让AI教你学英语,通过AI实现一个网易有道英语词典

目录 英语词典提问技巧效果图&#xff1a;提示词&#xff1a; 英语词典提问技巧 随着AI工具的出现&#xff0c;学英语也可以变得很简单&#xff0c;大家可以直接通过AI 来帮助自己&#xff0c;提高记忆单词的效率&#xff0c;都可以不需要网易有道词典了&#xff0c;今天我教大…

Grid 布局

文章目录 容器属性display 属性grid-template-columns 和 grid-template-rows 属性row-gap、column-gap、gap 属性grid-template-areas 属性grid-auto-flow 属性justify-items、align-items、place-items 属性justify-content、align-content、place-content 属性grid-auto-col…

AI图书推荐:AI驱动的图书写作工作流—从想法构思到变现

《AI驱动的图书写作工作流—从想法到变现》&#xff08;AI-Driven Book Creation: From Concept to Cash&#xff09;是Martynas Zaloga倾力打造的一本实用指南&#xff0c;它巧妙地将写作艺术与人工智能前沿技术相结合。此书不仅揭示了AI在图书出版领域的无限潜力&#xff0c;…

Delphi 的Show和ShowModal

Show没有返回值是一个过程&#xff0c;焦点可以不在当前窗体&#xff1b; 用法新建一个子窗体&#xff1a; 主窗体&#xff1a; 调用&#xff0c;引用子窗体的单元 调用 showmodal是一个函数有返回值&#xff0c;窗体的处理结果&#xff0c;且只能聚焦到当前窗体 效果都能展示…

echarts实现云台控制按钮效果,方向按钮

效果图 代码 option {color: [#bfbfbf],tooltip: {show: false},series: [{name: ,type: pie,radius: [40%, 70%],avoidLabelOverlap: true,itemStyle: {// borderRadius: 10,borderColor: #fff,borderWidth: 2},label: {show: true,position: inside,fontSize: 36,color: #f…

CST初级教程 二

本教程将讲解CST Studio的视窗操控的基本操作. 3D视窗的快捷操作 动态放大与缩小&#xff08;Dynamic Zoom&#xff09; 将鼠标指针移动到CST Studio图形视窗中&#xff0c;向上滚动鼠标滚轮&#xff0c;可动太放大图形视窗中的显示内容&#xff0c;向下滚动鼠标滚轮即可动态缩…

如何添加所有未跟踪文件到暂存区?

文章目录 如何将所有未跟踪文件添加到Git暂存区&#xff1f;步骤与示例代码1. 打开命令行或终端2. 列出所有未跟踪的文件3. 添加所有未跟踪文件到暂存区4. 验证暂存区状态 如何将所有未跟踪文件添加到Git暂存区&#xff1f; 在版本控制系统Git中&#xff0c;当我们首次创建新文…

《数据结构与算法之美》读书笔记4(递归)

递归是一种应用非常广泛的算法。之后要讲的很多数据结构和算法的编码实现都要用到递归&#xff1a;DFS深度优先搜索&#xff0c;前中后序二叉树遍历等。 推荐注册返佣金这个功能&#xff0c;用户A推荐用户B来注册&#xff0c;用户B推荐用户C来注册。可以说用户B的“最终推荐人…

乐鑫科技收购创新硬件公司 M5Stack 控股权

乐鑫科技 (688018.SH) 宣布收购 M5Stack&#xff08;明栈信息科技&#xff09;的控股权。这一战略举措对于物联网和嵌入式系统领域的两家公司来说都是一个重要的里程碑&#xff0c;也契合了乐鑫和 M5Stack 共同推动 AIoT 技术民主化的愿景。 M5Stack 以其创新的硬件开发方式而闻…

DSP技术及应用——学习笔记一(量化效应)

文章图片内容主要来着老师的PPT&#xff0c;内容为自己总结梳理的学习笔记 二进制定点表示与量化误差 二进制定点表示 基础知识 二进制小数的定点表示 正数小数的定点表示&#xff1a; 思考题&#xff1a;推算字长为16的二进制最大正数与二进制正数 补码&#xff1a;正数不变&…

微电子封装分类及引线键合

1微电子封装分类 - 按功能 模拟电路、存储器传感器、功率电路、光电器件、逻辑电路、射频电路、MEMS、LED等等 - 按结构 分立器件/单芯片封装、多芯片封装、三维封装、真空封装、非真空封装、CSP,BGA/FBGA等等 - 按工艺 线焊封装(WB)、倒装焊封装(FC)、晶圆级封装(WLP)等等 -…

华中农业大学第十三届程序设计竞赛 个人题解(待补)

前言&#xff1a; 注意本篇博客的题解目前并不完整&#xff0c;未来会慢慢补齐的。 进入实验室后接触算法比赛的机会更多了&#xff0c;我接触的题也不再是简单的c语言题了&#xff0c;开始遇到更多我没接触过的算法和难题了&#xff0c;死磕这些难题对现在的我不但花时间而且成…

kubebuilder(4)部署测试

将crd部署到k8s make install 日志&#xff1a; kustomize build config/crd | kubectl apply -f - customresourcedefinition.apiextensions.k8s.io/demoes.tutorial.demo.com created 查看下[rootpaas-m-k8s-master-1 demo-operator]# kubectl api-resources | grep demo de…

python爬虫学习-------scrapy的第一部分(二十九天)

&#x1f388;&#x1f388;作者主页&#xff1a; 喔的嘛呀&#x1f388;&#x1f388; &#x1f388;&#x1f388;所属专栏&#xff1a;python爬虫学习&#x1f388;&#x1f388; ✨✨谢谢大家捧场&#xff0c;祝屏幕前的小伙伴们每天都有好运相伴左右&#xff0c;一定要天天…

【stomp实战】搭建一套websocket推送平台

前面几个小节我们已经学习了stomp协议&#xff0c;了解了stomp客户端的用法&#xff0c;并且搭建了一个小的后台demo&#xff0c;前端页面通过html页面与后端服务建立WebSocket连接。发送消息给后端服务。后端服务将消息内容原样送回。通过这个demo我们学习了前端stomp客户端的…

BBEdit for Mac v15.0.3激活版 支持多种类型的代码编辑器

BBEdit包含了很多一流的功能&#xff0c;包括GREP图样匹配&#xff0c;搜索和替换多个文件&#xff08;即使未开启的远程服务器上的文件&#xff09;&#xff0c;项目定义的工具&#xff0c;功能导航和众多的源代码语言的语法着色&#xff0c;代码折叠&#xff0c;FTP和SFTP打开…
最新文章