YOLO数据集格式转换实战：PASCAL VOC XML与YOLO TXT互转详解

📅 2026/7/5 8:26:09 👁️ 阅读次数 📝 编程学习

1. 为什么需要数据集格式转换

第一次用YOLO训练模型时，我就被各种数据集格式搞晕了。当时用LabelImg标注了2000多张图片，默认生成的是PASCAL VOC格式的XML文件，但YOLO需要的却是TXT格式。后来才知道，这是计算机视觉领域最常见的两种标注格式：

PASCAL VOC XML：每个图片对应一个XML文件，记录图片尺寸、物体类别和边界框的绝对坐标（像素值）
YOLO TXT：每个图片对应一个TXT文件，每行记录一个物体的类别编号和归一化后的中心坐标/宽高

这两种格式最大的区别在于坐标表示方式。VOC用绝对像素值，比如<xmin>100</xmin>；而YOLO用相对值，比如0.45表示中心点横坐标是图片宽度的45%。我在第一次转换时，就因为没注意这个区别导致模型完全学不会。

2. 格式详解与对比

2.1 PASCAL VOC XML结构解析

用LabelImg标注后生成的典型XML文件长这样：

<annotation> <filename>IMG_001.jpg</filename> <size> <width>800</width> <height>600</height> <depth>3</depth> </size> <object> <name>cat</name> <bndbox> <xmin>100</xmin> <ymin>200</ymin> <xmax>300</xmax> <ymax>400</ymax> </bndbox> </object> </annotation>

关键字段说明：

size：图片的宽、高、通道数
object：每个检测对象一个节点
bndbox：用左上(xmin,ymin)和右下(xmax,ymax)坐标表示边界框

2.2 YOLO TXT格式详解

对应的YOLO格式TXT文件内容如下：

0 0.25 0.5 0.25 0.333

这行数据表示：

0：类别ID（对应cat）
0.25：中心点x坐标/图片宽度
0.5：中心点y坐标/图片高度
0.25：框宽度/图片宽度
0.333：框高度/图片高度

2.3 核心差异对比表

特征	PASCAL VOC	YOLO
文件格式	XML	TXT
坐标类型	绝对像素值	归一化相对值
框表示法	左上+右下坐标	中心点+宽高
多物体处理	多个object节点	每行一个物体
可视化难度	容易（含图片尺寸）	需要原始图片尺寸

3. XML转TXT实战

3.1 转换原理与公式

转换的核心是坐标归一化：

def convert(size, box): # size = (width, height) dw = 1./size[0] dh = 1./size[1] x = (box[0] + box[1])/2.0 # 计算中心点x y = (box[2] + box[3])/2.0 # 计算中心点y w = box[1] - box[0] # 计算宽度 h = box[3] - box[2] # 计算高度 x = x * dw # 归一化 w = w * dw y = y * dh h = h * dh return (x, y, w, h)

3.2 完整转换代码

import xml.etree.ElementTree as ET import os classes = ["cat", "dog"] # 你的类别列表 def convert_annotation(xml_path, txt_path): tree = ET.parse(xml_path) root = tree.getroot() with open(txt_path, 'w') as f: size = root.find('size') w = int(size.find('width').text) h = int(size.find('height').text) for obj in root.iter('object'): cls = obj.find('name').text if cls not in classes: continue cls_id = classes.index(cls) xmlbox = obj.find('bndbox') b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text)) bb = convert((w,h), b) f.write(f"{cls_id} {' '.join([str(a) for a in bb])}\n") # 批量转换 xml_dir = "Annotations/" txt_dir = "labels/" os.makedirs(txt_dir, exist_ok=True) for xml_file in os.listdir(xml_dir): if xml_file.endswith('.xml'): txt_file = xml_file.replace('.xml', '.txt') convert_annotation( os.path.join(xml_dir, xml_file), os.path.join(txt_dir, txt_file) )

3.3 常见问题解决

类别不匹配：确保XML中的类别都在classes列表中，否则会跳过
坐标越界：转换后检查数值是否在0-1之间
图片尺寸缺失：有些标注工具可能不生成size节点，需要手动添加

4. TXT转XML实战

4.1 反向转换原理

当需要数据增强或可视化时，需要将YOLO格式转回VOC格式。关键步骤：

读取图片获取原始尺寸

将归一化坐标还原为绝对坐标：

x_center = float(parts[1]) * width y_center = float(parts[2]) * height box_width = float(parts[3]) * width box_height = float(parts[4]) * height xmin = int(x_center - box_width/2) xmax = int(x_center + box_width/2) ymin = int(y_center - box_height/2) ymax = int(y_center + box_height/2)

4.2 完整转换代码

import cv2 from xml.dom.minidom import Document def txt_to_xml(txt_path, img_path, xml_path, class_list): img = cv2.imread(img_path) h, w, d = img.shape doc = Document() annotation = doc.createElement("annotation") doc.appendChild(annotation) # 添加文件信息 filename = doc.createElement("filename") filename.appendChild(doc.createTextNode(os.path.basename(img_path))) annotation.appendChild(filename) # 添加尺寸信息 size = doc.createElement("size") for tag, val in [('width', w), ('height', h), ('depth', d)]: elem = doc.createElement(tag) elem.appendChild(doc.createTextNode(str(val))) size.appendChild(elem) annotation.appendChild(size) # 处理每个标注 with open(txt_path) as f: for line in f.readlines(): parts = line.strip().split() obj = doc.createElement("object") # 类别 name = doc.createElement("name") name.appendChild(doc.createTextNode(class_list[int(parts[0])])) obj.appendChild(name) # 边界框 bndbox = doc.createElement("bndbox") coords = ['xmin', 'ymin', 'xmax', 'ymax'] values = convert_yolo_to_voc(w, h, list(map(float, parts[1:5]))) for c, v in zip(coords, values): elem = doc.createElement(c) elem.appendChild(doc.createTextNode(str(int(v)))) bndbox.appendChild(elem) obj.appendChild(bndbox) annotation.appendChild(obj) # 保存XML with open(xml_path, 'w') as f: doc.writexml(f, indent='', addindent='\t', newl='\n', encoding='UTF-8')

4.3 转换后验证技巧

用LabelImg打开生成的XML，检查框位置是否正确
比较转换前后的目标数量是否一致
特别检查边缘位置的框（如xmin=0或xmax=width）

5. 实际项目经验分享

去年在做一个工业质检项目时，客户提供了VOC格式的数据集，但我们需要用YOLOv5训练。转换过程中踩过几个坑：

类别ID不连续：客户给的类别是["OK", "NG", "Critical"]，但标注文件里直接用0/1/2。解决方案是建立明确的类别映射表。
图片尺寸不一致：有些图片实际尺寸和XML里的size不一致。后来加了校验代码：

img = cv2.imread(img_path) assert img.shape[0] == h and img.shape[1] == w, "尺寸不匹配"

特殊字符处理：XML中的特殊字符（如&, <）会导致解析失败。现在我的代码里都会加：

from xml.sax.saxutils import escape name = escape(obj.find('name').text)

建议在转换完成后，用这个脚本快速检查数据质量：

import matplotlib.pyplot as plt import matplotlib.patches as patches def visualize(img_path, txt_path, class_names): img = plt.imread(img_path) fig, ax = plt.subplots(1) ax.imshow(img) with open(txt_path) as f: for line in f: cls_id, x, y, w, h = map(float, line.split()) rect = patches.Rectangle( ((x-w/2)*img.shape[1], (y-h/2)*img.shape[0]), w*img.shape[1], h*img.shape[0], linewidth=2, edgecolor='r', facecolor='none') ax.add_patch(rect) plt.text((x-w/2)*img.shape[1], (y-h/2)*img.shape[0]-10, class_names[int(cls_id)], color='white', bbox=dict(facecolor='red', alpha=0.5)) plt.show()

编程学习技术分享实战经验

资讯详情

YOLO数据集格式转换实战：PASCAL VOC XML与YOLO TXT互转详解

1. 为什么需要数据集格式转换

2. 格式详解与对比

2.1 PASCAL VOC XML结构解析

2.2 YOLO TXT格式详解

2.3 核心差异对比表

3. XML转TXT实战

3.1 转换原理与公式

3.2 完整转换代码

3.3 常见问题解决

4. TXT转XML实战

4.1 反向转换原理

4.2 完整转换代码

4.3 转换后验证技巧

5. 实际项目经验分享

最新新闻

日新闻

周新闻

月新闻

资讯详情

YOLO数据集格式转换实战：PASCAL VOC XML与YOLO TXT互转详解

1. 为什么需要数据集格式转换

2. 格式详解与对比

2.1 PASCAL VOC XML结构解析

2.2 YOLO TXT格式详解

2.3 核心差异对比表

3. XML转TXT实战

3.1 转换原理与公式

3.2 完整转换代码

3.3 常见问题解决

4. TXT转XML实战

4.1 反向转换原理

4.2 完整转换代码

4.3 转换后验证技巧

5. 实际项目经验分享

相关新闻

最新新闻

日新闻

周新闻

月新闻