Introduction
Data labelling is an important task in Machine Learning. The quality of data we feed in the model will determine how well our model performs. Image annotation is the process of labelling images of a dataset for the machine learning model. It is used to label the features we need our model to recognize. In image annotation, the object is annotated and tagged with special techniques. This makes different type of object easily perceptible to AI-enabled machines.
Annotation work is usually carried out manually. While annotating, classes are predefined and features for the images are provided. The computer vision model is trained on these annotations. Now, it predicts the predetermined features on the new images which are not annotated.
Why Annotation is Important?
Computer vision models can learn a lot through annotated datasets. It can learn to predict accurately and relatively quicker. Therefore, it has its application in tasks like self-driving car, number-plate detection, tumor detection and many other remarkable applications.
The annotated datasets can provide our models the quality information. It can enable the model to learn well and predict well on new, unannotated data. With annotated images, the object detection can be easily performed. Thus, we rely heavily on these datasets to build AI-based models for automation.
Image Annotation for Object Detection
Image annotation refers to attaching labels (predetermined classes – human, dog, etc.) to an image. This is done to recognize, count, or segment objects boundaries in images. The annotations can have the following forms:
- Bounding boxes
- Semantic segmentation
- 3D Cuboids
- Polygons
- Lines & Splines
Image Annotation Formats
Computer vision problems require annotated data in their own defined formats. Some popular annotation formats are given below:
COCO
Microsoft COCO Dataset, a widely-used dataset. It has 2.5 million labeled instances for 80 object categories. COCO has total 5 annotation types
- object detection
- keypoint detection
- stuff segmentation
- panoptic segmentation
- image captioning
The annotations are stored in the JSON form. The format for object detection is as follows:
annotation{
"id": int,
"image_id": int,
"category_id": int,
"segmentation": RLE or [polygon],
"area": float,
"boundingbox": [x,y,width,height],
"iscrowd": 0 or 1}
categories[{
"id": int,
"name": str,
"supercategory": str,
}]
YOLO
YOLO (You Only Look Once) is a very fast and accurate object detection algorithm. In this format, .txt file is generated with the same name for each image file in the same directory. Each .txt file contains the annotations for the corresponding image file. It consists of object class, object coordinates, height and width.
<object-class> <x> <y> <width> <height>
Each object is annotated on a new line. For two objects, given below is how they will be written in the .txt file-
0 67 33 23 14
1 54 19 86 78
Pascal VOC
Pascal VOC provides standardized image datasets for object detection. The annotation is stored in the XML file. Given below is an example of Pascal VOC annotation file for object detection:
<annotation>
<folder>Train</folder>
<filename>01.png</filename>
<path>/path/Train/01.png</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>224</width>
<height>224</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>36</name>
<pose>Frontal</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<occluded>0</occluded>
<bndbox>
<xmin>90</xmin>
<xmax>190</xmax>
<ymin>54</ymin>
<ymax>70</ymax>
</bndbox>
</object>
</annotation>
TFRecord
A TFRecord (Tensorflow Record) file stores data in the form of sequence of binary strings. Tensorflow provides two components for specifying the structure of the data: tf.train.Example and tf.train.SequenceExample. Each sample of the data has to be stored in one of these structures. Then, it will have to be serialized using tf.python_io.TFRecordWriter to write it to disk.
The process of reading TFRecord is given as follows:
- Use tf.TFRecordReader to read the TFRecord.
- Define the features expected in the TFRecord by using tf.FixedLenFeature and tf.VarLenFeature.
- Parse one tf.train.Example (one file) a time using tf.parse_single_example.
Annotation Converters (COCO to CSV, YOLO to COCO, etc.)
We often need to convert annotated data of one format to another. This is done to make use of the annotated dataset in a more versatile manner. Thus, with annotation converter functions, we can easily achieve conversions like COCO to CSV format, YOLO to COCO format, etc.
In the rest of the article, we will create different functions to enable format conversions. So, you can directly use these functions to perform format conversions on your own dataset.
Features are best represented in the form of rows and columns. So, we begin with conversions from different formats(COCO, YOLO, etc.) to CSV format. Thus, we can get a good understanding of the features, classes, bounding boxes, etc.
COCO to CSV format
def coco_to_csv(filename):
import json
# COCO2017/annotations/instances_val2017.json
s = json.load(open(filename, 'r'))
out_file = filename[:-5] + '.csv'
out = open(out_file, 'w')
#out.write('id,x1,y1,x2,y2,label\n')
all_ids = []
for im in s['images']:
all_ids.append(im['id'])
all_fn = []
for im in s['images']:
all_fn.append(im['file_name'])
all_d = []
for im in s['images']:
all_d.append((im['height'],im['width']))
classes=[]
for cl in s['categories']:
classes.append(cl['name'])
all_ids_ann = []
for ann in s['annotations']:
image_id = ann['image_id']
all_ids_ann.append(image_id)
x1 = ann['bbox'][0]
x2 = ann['bbox'][2]-x1
y1 = ann['bbox'][1]
y2 = ann['bbox'][3]-y1
label = ann['category_id']
out.write('{},{},{},{},{},{},{},{}\n'.format(classes[label], x1, y1, x2, y2,all_fn[image_id], all_d[image_id][1],all_d[image_id][0] ))
YOLO to CSV format
import os
import glob
import pandas as pd
def yolo_to_csv(yolo_dir,destination_dir):
os.chdir(yolo_dir)
myFiles = glob.glob('*.txt')
classes=[]
with open(yolo_dir+'/classes.names','rt') as f:
for l in f.readlines():
classes.append(l[:-1])
width=1024
height=1024
image_id=0
final_df=[]
for item in myFiles:
image_id+=1
with open(item, 'rt') as fd:
for line in fd.readlines():
row = []
bbox_temp = []
splited = line.split()
print(splited)
try:
row.append(classes[int(splited[0])])
#print(row)
row.append(splited[1])
row.append(splited[2])
row.append(splited[3])
row.append(splited[4])
row.append(item[:-4]+".png")
row.append(width)
row.append(height)
final_df.append(row)
except:
pass
df = pd.DataFrame(final_df)
df.to_csv(destination_dir+"/saved.csv",index=False)
Pascal VOC to CSV format
import glob
import pandas as pd
import xml.etree.ElementTree as ET
def xml_to_csv(path):
xml_list = []
for xml_file in glob.glob(path + '/*.xml'):
print(xml_file)
tree = ET.parse(xml_file)
root = tree.getroot()
for member in root.findall('object'):
bbx = member.find('bndbox')
xmin = int(bbx.find('xmin').text)
ymin = int(bbx.find('ymin').text)
xmax = int(bbx.find('xmax').text)-xmin
ymax = int(bbx.find('ymax').text)-ymin
label = member.find('name').text
value = (
label,
xmin,
ymin,
xmax,
ymax,
root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text)
)
xml_list.append(value)
xml_df = pd.DataFrame(xml_list )
xml_df.to_csv(args.destination_dir + '\saved.csv', index=None, header=False)
TFRecord to CSV format
import tensorflow as tf
from PIL import Image
filenames = []
filenames.append('newtrain.record')
def read_tfrecord(serialized_example):
feature_description = {
'image/height': tf.io.FixedLenFeature((), tf.int64),
'image/width': tf.io.FixedLenFeature((), tf.int64),
'image/encoded': tf.io.FixedLenFeature((), tf.string),
'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
'image/object/class/text': tf.io.VarLenFeature(tf.string),
'image/filename': tf.io.FixedLenFeature((),tf.string)
}
parsed_features = tf.io.parse_single_example(serialized_example, feature_description)
parsed_features['image/encoded'] = tf.io.decode_jpeg(
parsed_features['image/encoded'], channels=3)
return parsed_features
data = tf.data.TFRecordDataset(filenames)
parsed_dataset = data.shuffle(128).map(read_tfrecord).batch(1)
print(parsed_dataset)
coord = []
filenames = []
labels = []
dim = []
for sample in parsed_dataset.take(10000):
numpyed = sample['image/encoded'].numpy()
alist = numpyed[0,:,:,:]
for i in range(len(sample['image/object/bbox/xmin'].values.numpy())):
coord.append([round(sample['image/object/bbox/xmin'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/width'])[0]),
round(sample['image/object/bbox/ymin'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/height'])[0]),
round(sample['image/object/bbox/xmax'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/width'])[0]),
round(sample['image/object/bbox/ymax'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/height'])[0])])
filenames.append(str(sample['image/filename'].numpy()[0])[2:][:-1])
dim.append([tf.keras.backend.get_value(sample['image/width'])[0],
tf.keras.backend.get_value(sample['image/height'])[0]])
for i in list(sample['image/object/class/text'].values.numpy()):
labels.append(str(i)[2:][:-1])
img = Image.fromarray(alist, 'RGB')
img.save('new.jpg')
out_file = 'file.csv'
out = open(out_file, 'w')
for i in range(len(coord)):
x1= coord[i][0]
y1= coord[i][1]
x2 = coord[i][2]-x1
y2 = coord[i][3]-y1
out.write('{},{},{},{},{},{},{},{}\n'.format(labels[i], x1, y1, x2, y2,filenames[i],dim[i][0],dim[i][1]))
Now, we will look at functions for conversions from CSV to COCO, YOLO, VOC PASCAL and TFRecord formats.
CSV to COCO format
import json
import pandas as pd
def csv_to_coco(file_dir,destination_dir):
path = file_dir
save_json_path = destination_dir+'/traincoco.json'
clmns = ['class','xmin','ymin','xmax','ymax','filename','width','height']
data = pd.read_csv(path, names = clmns, header=None)
images = []
categories = []
annotations = []
data['fileid'] = data['filename'].astype('category').cat.codes
data['categoryid']= pd.Categorical(data['class'],ordered= True).codes
data['categoryid'] = data['categoryid']+1
data['annid'] = data.index
def image(row):
image = {}
image["height"] = row.height
image["width"] = row.width
image["id"] = row.fileid
image["file_name"] = row.filename
return image
def category(row):
category = {}
category["supercategory"] = 'None'
category["id"] = row.categoryid-1
category["name"] = row[1]
return category
def annotation(row):
annotation = {}
area = (row.xmax)*(row.ymax)
annotation["segmentation"] = []
annotation["iscrowd"] = 0
annotation["area"] = area
annotation["image_id"] = row.fileid
annotation["bbox"] = [row.xmin, row.ymin, row.xmax +row.xmin,row.ymax+row.ymin ]
annotation["category_id"] = row.categoryid-1
annotation["id"] = row.annid
return annotation
for row in data.itertuples():
annotations.append(annotation(row))
imagedf = data.drop_duplicates(subset=['fileid']).sort_values(by='fileid')
for row in imagedf.itertuples():
images.append(image(row))
catdf = data.drop_duplicates(subset=['categoryid']).sort_values(by='categoryid')
for row in catdf.itertuples():
categories.append(category(row))
data_coco = {}
data_coco["images"] = images
data_coco["categories"] = categories
data_coco["annotations"] = annotations
json.dump(data_coco, open(save_json_path, "w"), indent=4)
CSV to YOLO format
import csv
import numpy as np
import os
def csv_to_yolo(csv_file,destination_folder):
if not os.path.exists(destination_folder+'\data'):
os.makedirs(destination_folder+'\data')
classes_names = []
file_names = []
data = csv.reader(open(csv_file))
for l in data:
file_names.append(l[5])
classes_names.append(l[0])
classes_names = np.unique(classes_names)
classes = {k: v for v, k in enumerate(classes_names)}
f=open(destination_folder+"/data/"+ 'classes.names','a')
for i in classes_names:
f.write(str(i))
f.write('\n')
f.close()
for name in np.unique(file_names):
file = open(destination_folder+'/data/'+str(name[:-4])+".txt",'a')
for l in csv.reader(open(csv_file)):
if(l[5]==name):
file.write(str(classes[l[0]]))
file.write(' ')
file.write(l[1])
file.write(' ')
file.write(l[2])
file.write(' ')
file.write(l[3])
file.write(' ')
file.write(l[4])
file.write(' ')
file.write('\n')
file.close()
CSV to Pascal VOC format
from collections import defaultdict
import os
import csv
from xml.etree.ElementTree import Element, SubElement, ElementTree
def csv_to_voc_pascal(file_dir,save_root2):
file_dir = 'here1.csv'
save_root2 = save_root2 + "/result_xmls"
if not os.path.exists(save_root2):
os.mkdir(save_root2)
def write_xml(folder, filename, bbox_list):
root = Element('annotation')
SubElement(root, 'folder').text = folder
SubElement(root, 'filename').text = filename
SubElement(root, 'path').text = './images' + filename
source = SubElement(root, 'source')
SubElement(source, 'database').text = 'Unknown'
# Details from first entry
e_class_name, e_xmin, e_ymin, e_xmax, e_ymax, e_filename, e_width, e_height = bbox_list[0]
size = SubElement(root, 'size')
SubElement(size, 'width').text = e_width
SubElement(size, 'height').text = e_height
SubElement(size, 'depth').text = '3'
SubElement(root, 'segmented').text = '0'
for entry in bbox_list:
e_class_name, e_xmin, e_ymin, e_xmax, e_ymax, e_filename, e_width, e_height = entry
obj = SubElement(root, 'object')
SubElement(obj, 'name').text = e_class_name
SubElement(obj, 'pose').text = 'Unspecified'
SubElement(obj, 'truncated').text = '0'
SubElement(obj, 'difficult').text = '0'
bbox = SubElement(obj, 'bndbox')
SubElement(bbox, 'xmin').text = e_xmin
SubElement(bbox, 'ymin').text = e_ymin
SubElement(bbox, 'xmax').text = e_xmax
SubElement(bbox, 'ymax').text = e_ymax
#indent(root)
tree = ElementTree(root)
xml_filename = os.path.join('.', folder, os.path.splitext(filename)[0] + '.xml')
tree.write(xml_filename)
entries_by_filename = defaultdict(list)
with open(file_dir, 'r', encoding='utf-8') as f_input_csv:
csv_input = csv.reader(f_input_csv)
header = next(csv_input)
class_name, xmin, ymin, xmax, ymax, filename, width, height= header
header[3]=str(int(header[1])+int(header[3]))
header[4]=str(int(header[2])+int(header[4]))
entries_by_filename[filename].append(header)
for row in csv_input:
class_name, xmin, ymin, xmax, ymax, filename, width, height= row
row[3]=str(int(row[1])+int(row[3]))
row[4]=str(int(row[2])+int(row[4]))
#print(row)
entries_by_filename[filename].append(row)
for filename, entries in entries_by_filename.items():
#print(filename, len(entries))
write_xml(save_root2, filename, entries)
CSV to TFRecord format
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
import os
import io
import pandas as pd
import tensorflow as tf
from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict
flags = tf.compat.v1.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('image_dir', '', 'Path to images')
FLAGS = flags.FLAGS
columns = ['class','xmin','ymin','xmax','ymax','filename','width','height']
data = pd.read_csv(FLAGS.csv_input,names = columns)
df_reorder = data[['filename','width','height','class','xmin','ymin','xmax','ymax']] # rearrange column here
df_reorder.to_csv('newcsv.csv', index=False)
#TO-DO
def class_text_to_int(row_label):
if row_label == 'Man':
return 1
elif row_label == 'Dog':
return 2
elif row_label == 'Monitor':
return 3
elif row_label == 'Machine':
return 4
elif row_label == 'Girl':
return 5
else:
None
def split(df, group):
data = namedtuple('data', ['filename', 'object'])
gb = df.groupby(group)
return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]
def create_tf_example(group, path):
i =0
with tf.io.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = Image.open(encoded_jpg_io)
width, height = image.size
filename = group.filename.encode('utf8')
image_format = b'jpg'
xmins = []
xmaxs = []
ymins = []
ymaxs = []
classes_text = []
classes = []
for index, row in group.object.iterrows():
xmins.append(row['xmin'] / width)
xmaxs.append((row['xmax']+row['xmin']) / width)
ymins.append(row['ymin'] / height)
ymaxs.append((row['ymax']+row['ymin'])/ height)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
def main(_):
writer = tf.io.TFRecordWriter(FLAGS.output_path)
path = os.path.join(FLAGS.image_dir)
examples = pd.read_csv('newcsv.csv')
grouped = split(examples, 'filename')
for group in grouped:
tf_example = create_tf_example(group, path)
writer.write(tf_example.SerializeToString())
writer.close()
output_path = os.path.join(os.getcwd(), FLAGS.output_path)
print('Successfully created the TFRecords: {}'.format(output_path))
To achieve any conversions, we can convert it to CSV format. Then, CSV format can be converted to the desired format. For instance, we need to convert COCO to Pascal VOC. We will first convert COCO to CSV format. Then, CSV will be converted to Pascal VOC format
In the last part of the article, some direct conversions functions are provided:
XML to JSON format
import os
import json
import xml.etree.ElementTree as ET
import glob
START_BOUNDING_BOX_ID = 1
PRE_DEFINE_CATEGORIES = None
def get(root, name):
vars = root.findall(name)
return vars
def get_and_check(root, name, length):
vars = root.findall(name)
if len(vars) == 0:
raise ValueError("Can not find %s in %s." % (name, root.tag))
if length > 0 and len(vars) != length:
raise ValueError(
"The size of %s is supposed to be %d, but is %d."
% (name, length, len(vars))
)
if length == 1:
vars = vars[0]
return vars
def get_filename(filename):
filename = filename.replace("\\", "/")
filename = os.path.splitext(os.path.basename(filename))[0]
return str(filename)
def get_categories(xml_files):
"""Generate category name to id mapping from a list of xml files.
Arguments:
xml_files {list} -- A list of xml file paths.
Returns:
dict -- category name to id mapping.
"""
classes_names = []
for xml_file in xml_files:
tree = ET.parse(xml_file)
root = tree.getroot()
for member in root.findall("object"):
classes_names.append(member[0].text)
classes_names = list(set(classes_names))
classes_names.sort()
return {name: i for i, name in enumerate(classes_names)}
def convert(xml_files, json_file):
json_dict = {"images": [], "type": "instances", "annotations": [], "categories": []}
if PRE_DEFINE_CATEGORIES is not None:
categories = PRE_DEFINE_CATEGORIES
else:
categories = get_categories(xml_files)
bnd_id = START_BOUNDING_BOX_ID
for xml_file in xml_files:
tree = ET.parse(xml_file)
root = tree.getroot()
path = get(root, "path")
if len(path) == 1:
filename = os.path.basename(path[0].text)
elif len(path) == 0:
filename = get_and_check(root, "filename", 1).text
else:
raise ValueError("%d paths found in %s" % (len(path), xml_file))
## The filename must be a number
image_id = get_filename(filename)
size = get_and_check(root, "size", 1)
width = int(get_and_check(size, "width", 1).text)
height = int(get_and_check(size, "height", 1).text)
image = {
"file_name": filename,
"height": height,
"width": width,
"id": image_id,
}
json_dict["images"].append(image)
for obj in get(root, "object"):
category = get_and_check(obj, "name", 1).text
if category not in categories:
new_id = len(categories)
categories[category] = new_id
category_id = categories[category]
bndbox = get_and_check(obj, "bndbox", 1)
xmin = int(get_and_check(bndbox, "xmin", 1).text) - 1
ymin = int(get_and_check(bndbox, "ymin", 1).text) - 1
xmax = int(get_and_check(bndbox, "xmax", 1).text)
ymax = int(get_and_check(bndbox, "ymax", 1).text)
assert xmax > xmin
assert ymax > ymin
o_width = abs(xmax - xmin)
o_height = abs(ymax - ymin)
ann = {
"area": o_width * o_height,
"iscrowd": 0,
"image_id": image_id,
"bbox": [xmin, ymin, o_width, o_height],
"category_id": category_id,
"id": bnd_id,
"ignore": 0,
"segmentation": [],
}
json_dict["annotations"].append(ann)
bnd_id = bnd_id + 1
for cate, cid in categories.items():
cat = {"supercategory": "none", "id": cid, "name": cate}
json_dict["categories"].append(cat)
os.makedirs(os.path.dirname(json_file), exist_ok=True)
json_fp = open(json_file, "w")
json_str = json.dumps(json_dict)
json_fp.write(json_str)
json_fp.close()
JSON to YOLO format
import json
classes = ["Man","Monitor","Dog"]
def convert(size,box):
x = box[0]
y = box[1]
w = box[2]
h = box[3]
return (x,y,w,h)
def convert_annotation(json_dir,destination_dir):
with open(json_dir,'r') as f:
data = json.load(f)
for item in data['images']:
image_id = item['id']
file_name = item['file_name']
width = item['width']
height = item['height']
value = filter(lambda item1: item1['image_id'] == image_id,data['annotations'])
outfile = open(destination_dir+"%s.txt"%(file_name[:-4]), 'a+')
for item2 in value:
category_id = item2['category_id']
value1 = list(filter(lambda item3: item3['id'] == category_id,data['categories']))
name = value1[0]['name']
class_id = classes.index(name)
box = item2['bbox']
bb = convert((width,height),box)
outfile.write(str(class_id)+" "+" ".join([str(a) for a in bb]) + '\n')