Yolo系列学习1-Yolov3训练自己的数据 - 好文

前提：

可运行的yolov3环境，环境搭建见官网https://pjreddie.com/darknet/yolo/
<https://pjreddie.com/darknet/yolo/>

目的：

实现利用yolov3训练自己的数据集（voc格式）

方法：

1）构建VOC数据集

* 将你手中的数据集的标注txt修改成voc格式的txt，voc格式如下： 000002.jpg car 44 28 132 121 000003.jpg
car 54 19 243 178 000004.jpg car 168 6 298 164
其中第一列为图片名，第二列为目标类别，最后是目标的包围框坐标（左上角和右下角坐标）。

批量修改文件名python代码：
pic_path = "D:/VOCdevkit/VOC2007/JPEGImages/" piclist = os.listdir(pic_path)
total_num = len(piclist) i = 1 for pic in piclist: if pic.endswith(".jpg"):
old_path = os.path.join(os.path.abspath(pic_path), pic) new_path =
os.path.join(os.path.abspath(pic_path), '000' + format(str(i), '0>5') + '.jpg')
os.renames(old_path, new_path) i = i + 1
批量合并文件夹内所有txt文件python代码：
import os filedir = "D:/DET/" filenames=os.listdir(filedir)
f=open('train.txt','w') for filename in filenames: filepath =
filedir+'/'+filename for line in open(filepath): f.writelines(line) f.close()

*
将该train.txt转换成voc数据所需要的xml，matlab代码如下：

clc; clear; imgpath='D:/VOCdevkit/VOC2007/JPEGImages/';%图像存放文件夹
txtpath='D:/train.txt';%txt文件
xmlpath_new='D:/VOCdevkit/VOC2007/Annotations/';%修改后的xml保存文件夹
foldername='JPEGImages';
path='/home/zhangzhi/darknet/scripts/VOCdevkit/VOC2007/JPEGImages/';
fidin=fopen(txtpath,'r'); lastname='begin'; while ~feof(fidin)
tline=fgetl(fidin); str = regexp(tline, ' ','split');
filepath=[imgpath,str{1}]; ppath=[path,str{1}]; img=imread(filepath);
[h,w,d]=size(img); % imshow(img); %
rectangle('Position',[str2double(str{3}),str2double(str{4}),str2double(str{5})-str2double(str{3}),str2double(str{6})-str2double(str{4})],'LineWidth',4,'EdgeColor','r');
pause(0.1); if strcmp(str{1},lastname)%如果文件名相等，只需增加object
object_node=Createnode.createElement('object'); Root.appendChild(object_node);
node=Createnode.createElement('name');
node.appendChild(Createnode.createTextNode(sprintf('%s',str{2})));
object_node.appendChild(node); node=Createnode.createElement('pose');
node.appendChild(Createnode.createTextNode(sprintf('%s','Unspecified')));
object_node.appendChild(node); node=Createnode.createElement('truncated');
node.appendChild(Createnode.createTextNode(sprintf('%s','0')));
object_node.appendChild(node); node=Createnode.createElement('difficult');
node.appendChild(Createnode.createTextNode(sprintf('%s','0')));
object_node.appendChild(node); bndbox_node=Createnode.createElement('bndbox');
object_node.appendChild(bndbox_node); node=Createnode.createElement('xmin');
node.appendChild(Createnode.createTextNode(sprintf('%s',num2str(str{3}))));
bndbox_node.appendChild(node); node=Createnode.createElement('ymin');
node.appendChild(Createnode.createTextNode(sprintf('%s',num2str(str{4}))));
bndbox_node.appendChild(node); node=Createnode.createElement('xmax');
node.appendChild(Createnode.createTextNode(sprintf('%s',num2str(str{5}))));
bndbox_node.appendChild(node); node=Createnode.createElement('ymax');
node.appendChild(Createnode.createTextNode(sprintf('%s',num2str(str{6}))));
bndbox_node.appendChild(node); else copyfile(filepath, 'JPEGImages'); if
exist('Createnode','var') tempname=lastname;
tempname=strrep(tempname,'.jpg','.xml'); xmlwrite(tempname,Createnode); end
Createnode=com.mathworks.xml.XMLUtils.createDocument('annotation');
Root=Createnode.getDocumentElement; node=Createnode.createElement('folder');
node.appendChild(Createnode.createTextNode(sprintf('%s',foldername)));
Root.appendChild(node); node=Createnode.createElement('filename');
node.appendChild(Createnode.createTextNode(sprintf('%s',str{1})));
Root.appendChild(node); node=Createnode.createElement('path');
node.appendChild(Createnode.createTextNode(sprintf('%s',ppath)));
Root.appendChild(node); source_node=Createnode.createElement('source');
Root.appendChild(source_node); node=Createnode.createElement('database');
node.appendChild(Createnode.createTextNode(sprintf('My Database')));
source_node.appendChild(node); size_node=Createnode.createElement('size');
Root.appendChild(size_node); node=Createnode.createElement('width');
node.appendChild(Createnode.createTextNode(sprintf('%s',num2str(w))));
size_node.appendChild(node); node=Createnode.createElement('height');
node.appendChild(Createnode.createTextNode(sprintf('%s',num2str(h))));
size_node.appendChild(node); node=Createnode.createElement('depth');
node.appendChild(Createnode.createTextNode(sprintf('%s',num2str(d))));
size_node.appendChild(node); node=Createnode.createElement('segmented');
node.appendChild(Createnode.createTextNode(sprintf('%s','0')));
Root.appendChild(node); object_node=Createnode.createElement('object');
Root.appendChild(object_node); node=Createnode.createElement('name');
node.appendChild(Createnode.createTextNode(sprintf('%s',str{2})));
object_node.appendChild(node); node=Createnode.createElement('pose');
node.appendChild(Createnode.createTextNode(sprintf('%s','Unspecified')));
object_node.appendChild(node); node=Createnode.createElement('truncated');
node.appendChild(Createnode.createTextNode(sprintf('%s','0')));
object_node.appendChild(node); node=Createnode.createElement('difficult');
node.appendChild(Createnode.createTextNode(sprintf('%s','0')));
object_node.appendChild(node); bndbox_node=Createnode.createElement('bndbox');
object_node.appendChild(bndbox_node); node=Createnode.createElement('xmin');
node.appendChild(Createnode.createTextNode(sprintf('%s',num2str(str{3}))));
bndbox_node.appendChild(node); node=Createnode.createElement('ymin');
node.appendChild(Createnode.createTextNode(sprintf('%s',num2str(str{4}))));
bndbox_node.appendChild(node); node=Createnode.createElement('xmax');
node.appendChild(Createnode.createTextNode(sprintf('%s',num2str(str{5}))));
bndbox_node.appendChild(node); node=Createnode.createElement('ymax');
node.appendChild(Createnode.createTextNode(sprintf('%s',num2str(str{6}))));
bndbox_node.appendChild(node); lastname=str{1}; end if feof(fidin)
tempname=lastname; tempname=strrep(tempname,'.jpg','.xml');
xmlwrite(tempname,Createnode); end end fclose(fidin); file=dir(pwd); for
i=1:length(file) if length(file(i).name)>=4 &&
strcmp(file(i).name(end-3:end),'.xml') fold=fopen(file(i).name,'r');
fnew=fopen([xmlpath_new file(i).name],'w'); line=1; while ~feof(fold)
tline=fgetl(fold); if line==1 line=2; continue; end expression = ' ';
replace=char(9); newStr=regexprep(tline,expression,replace);
fprintf(fnew,'%s\n',newStr); end     fprintf('已处理%s\n',file(i).name);
fclose(fold);     fclose(fnew);    delete(file(i).name); end end 生成的xml如下所示
<annotation> <folder>JPEGImages</folder> <filename>00000000.jpg</filename>
<path>/home/zhangzhi/darknet/scripts/VOCdevkit/VOC2007/JPEGImages/00000000.jpg</path>
<source> <database>My Database</database> </source> <size> <width>512</width>
<height>512</height> <depth>3</depth> </size> <segmented>0</segmented> <object>
<name>car</name> <pose>Unspecified</pose> <truncated>0</truncated>
<difficult>0</difficult> <bndbox> <xmin>277</xmin> <ymin>498</ymin>
<xmax>304</xmax> <ymax>511</ymax> </bndbox> </object> </annotation>
* 生成Main中的四个txt（train.txt，val.txt，test.txt，trainval.txt）
txt的内容为没有后缀名的图片名称：
000005 000027 000028 000033 000042 000045 000048 000058

即图片名字（无后缀）,test.txt是测试集，train.txt是训练集，val.txt是验证集，trainval.txt是训练和验证集。VOC2007中，trainval大概是整个数据集的50%，test也大概是整个数据集的50%；train大概是trainval的50%，val大概是trainval的50%。可参考以下代码：
%% %该代码根据已生成的xml，制作VOC2007数据集中的trainval.txt;train.txt;test.txt和val.txt
%trainval占总数据集的50%，test占总数据集的50%；train占trainval的50%，val占trainval的50%；
%上面所占百分比可根据自己的数据集修改，如果数据集比较少，test和val可少一些 %注意修改下面四个值
xmlfilepath='F:/VOCdevkit/VOC2007/Annotations/';
txtsavepath='F:/VOCdevkit/VOC2007/ImageSets/Main/;
trainval_percent=0.5;%trainval占整个数据集的百分比，剩下部分就是test所占百分比
train_percent=0.5;%train占trainval的百分比，剩下部分就是val所占百分比 %%
xmlfile=dir(xmlfilepath); numOfxml=length(xmlfile)-2;%减去.和.. 总的数据集大小
trainval=sort(randperm(numOfxml,floor(numOfxml*trainval_percent)));
test=sort(setdiff(1:numOfxml,trainval));
trainvalsize=length(trainval);%trainval的大小
train=sort(trainval(randperm(trainvalsize,floor(trainvalsize*train_percent))));
val=sort(setdiff(trainval,train)); ftrainval=fopen([txtsavepath
'trainval.txt'],'w'); ftest=fopen([txtsavepath 'test.txt'],'w');
ftrain=fopen([txtsavepath 'train.txt'],'w'); fval=fopen([txtsavepath
'val.txt'],'w'); for i=1:numOfxml if ismember(i,trainval)
fprintf(ftrainval,'%s\n',xmlfile(i+2).name(1:end-4)); if ismember(i,train)
fprintf(ftrain,'%s\n',xmlfile(i+2).name(1:end-4)); else
fprintf(fval,'%s\n',xmlfile(i+2).name(1:end-4)); end else
fprintf(ftest,'%s\n',xmlfile(i+2).name(1:end-4)); end end fclose(ftrainval);
fclose(ftrain); fclose(fval); fclose(ftest);

* 整合文件

新建立一个VOC2007文件夹，在该文件夹下面新建JPEGImages，Annotations，labels，ImageSets文件夹，将所有训练的图片均放置在JPEGImages文件夹下，将第二步生成的xml文件放置在Annotations文件夹中，在ImageSets下新建Main文件夹，将第三步生成的四个txt放入其中，将下面步骤生成的文件放置于labels文件夹中

* 上面步骤的代码均是在Windows下使用，下面代码在Ubuntu下使用。生成labels文件： import
xml.etree.ElementTree as ET import pickle import os from os import listdir,
getcwd from os.path import join #修改 #sets=[('2012', 'train'), ('2012', 'val'),
('2007', 'train'), ('2007', 'val'), ('2007', 'test')] sets=[('2007', 'train'),
('2007', 'val'), ('2007', 'test')] #修改 classes = ["aeroplane", "bicycle",
"bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
"dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train",
"tvmonitor"] classes = ["car", "van", "truck ", "bus"] def convert(size, box):
dw = 1./size[0] dh = 1./size[1] x = (box[0] + box[1])/2.0 y = (box[2] +
box[3])/2.0 w = box[1] - box[0] h = box[3] - box[2] x = x*dw w = w*dw y = y*dh
h = h*dh return (x,y,w,h) def convert_annotation(year, image_id): in_file =
open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id)) out_file =
open('VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')
tree=ET.parse(in_file) root = tree.getroot() size = root.find('size') w =
int(size.find('width').text) h = int(size.find('height').text) for obj in
root.iter('object'): difficult = obj.find('difficult').text cls =
obj.find('name').text if cls not in classes or int(difficult) == 1: continue
cls_id = classes.index(cls) xmlbox = obj.find('bndbox') b =
(float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text),
float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text)) bb =
convert((w,h), b) out_file.write(str(cls_id) + " " + " ".join([str(a) for a in
bb]) + '\n') wd = getcwd() for year, image_set in sets: if not
os.path.exists('VOCdevkit/VOC%s/labels/'%(year)):
os.makedirs('VOCdevkit/VOC%s/labels/'%(year)) image_ids =
open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year,
image_set)).read().strip().split() list_file = open('%s_%s.txt'%(year,
image_set), 'w') for image_id in image_ids:
list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n'%(wd, year, image_id))
convert_annotation(year, image_id) list_file.close()
#如果需要用train和val的数据一起用来训练，合并文件： os.system("cat 2007_train.txt 2007_val.txt >
train.txt") os.system("cat 2007_train.txt 2007_val.txt 2007_test.txt >
train.all.txt")

2）修改yolov3的相关文件

* 修改cfg/voc.data文件，进行如下修改： classes= 4 # 自己数据集的类别数 train =
/home/zhangzhi/darknet/VOCdevkit/2007_train.txt # train文件的路径 valid =
/home/zhangzhi/darknet/VOCdevkit/2007_test.txt # test文件的路径 names =
data/voc.names backup = backup
* 修改data/voc.names文件，对应自己的数据集修改类别。 car van truck bus
* 下载Imagenet上预先训练的权重 wget https://pjreddie.com/media/files/darknet53.conv.74
* 修改cfg/yolov3-voc.cfg
找到文件中类似的部分进行修改，共有3处：
[convolutional] size=1 stride=1 pad=1 filters=27 activation=linear [yolo] mask
= 0,1,2 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198,
373,326 classes=4 num=9 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1
filters=27 activation=linear [yolo] mask = 0,1,2 anchors = 10,13, 16,30, 33,23,
30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=4 num=9 jitter=.3
ignore_thresh = .5 truth_thresh = 1 random=1
需要改变filters为num/3*(classes+1+4)，即3*(classes+1+4)，参考
https://github.com/pjreddie/darknet/issues/582
<https://github.com/pjreddie/darknet/issues/582>，同时需要修改下面的classes的种类。

3）训练，测试
./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74
./darknet detector test cfg/voc.data cfg/yolov3-voc.cfg
backup/yolov3-voc_final.weights data/dog.jpg

参考：

https://blog.csdn.net/sinat_30071459/article/details/50723212
<https://blog.csdn.net/sinat_30071459/article/details/50723212>

https://blog.csdn.net/helloworld1213800/article/details/79749359
<https://blog.csdn.net/helloworld1213800/article/details/79749359>

热门工具换一换