Multiclass Classification on Highly Imbalanced Dataset

Introduction

Multiclass classification is a classification problem where more than two classes are present. It is a fundamental machine learning task which aims to classify each instance into one of a predefined set of classes. For instance, classifying a set of images of animals as dogs, cats or rabbits. Each sample is assigned to only one label, i.e., an image can be classified as either dog or rabbit but not both at the same time.

When working on a classification problem, there are instances when one class label has lower number of observations than other class labels. So, this type of dataset is known as imbalanced dataset. The problem is common and can lead to biased classification by the model.

This article will talk about handling imbalanced dataset. It will also provide a step-by-step approach to perform multiclass classification using machine learning algorithms.

Dataset

The dataset used is a numeric dataset. It consists of 11k+ rows and 10 columns. The columns 1-9 are the feature vectors and column 10 is the target vector. There are total 20 classes that need to be predicted and the classes are labelled as digits from 0-19.

The head() method is used to return top n (5 by default) rows of a data frame.

df = pd.read_csv("./training.csv")
df.head()

The dataset is highly imbalanced. Count plot of the target can be visualized as following

import seaborn as sns
sns.countplot(df['target'])

Removal of outliers

Boxplot

In this part, we are going to learn about outliers. Outliers in a dataset are the values which lie far away from majority of points. Boxplots can be plotted which show the median, minimum, maximum, first quartile and third quartile. Using boxplots, we can visually check for the outliers present in the dataset.

The first quartile(Q1) is the median of the lower half of the set. This means that about 25% of the values are less than Q1.

The third quartile(Q3) is the median of the upper half of the set. This means that about 75% of the values are less than Q3.

IQR

IQR (Inter-Quartile Range) is the difference between Q3 and Q1. An interval is calculated using the given equation:

[ Q1 – 1.5 x IQR, Q3 + 1.5 x IQR ]

Therefore, if an observation lies outside this interval, it is considered as an outlier.

#finding outliers for feature3
#finding the 1st quartile
q1 = np.quantile(df.feature3, 0.25)

# finding the 3rd quartile
q3 = np.quantile(df.feature3, 0.75)

med = np.median(df.feature3)

# finding the iqr region
iqr = q3-q1

# finding upper and lower whiskers
upper_bound = q3+(1.5*iqr)
lower_bound = q1-(1.5*iqr)
outliers = df.feature3[(df.feature3 <= lower_bound) | (df.feature3 >= upper_bound)]

Feature Selection using Correlation Heatmap

Correlation is a statistical score which tells how close two variables are to having a linear relationship with each other. Higher correlation between two variables will have very similar effect on dependent variables. Therefore, we prefer dropping one of the two features.

We can generate a correlation matrix using:

df.corr()

And correlation heatmap using:

sns.heatmap(corr)

The correlation heatmap generated is shown below

The feature1 and feature3 have high correlation with other features and dropping them gave us better results.

X = df.drop('target',axis=1)
y=df['target']
X = X.drop(['feature1','feature3'],axis = 1)

Handling Imbalanced Dataset

SMOTE (Synthetic Minority Oversampling Technique)

SMOTE helps in oversampling the examples in the minority class. The process involved in oversampling is given as-

  1. Select random data from the minority class.
  2. Calculate the Euclidean distance between the random data and its k nearest neighbors.
  3. Multiply the difference with a random number between 0 and 1. Then, add the result to the minority class as a synthetic sample.
  4. Repeat the procedure until the expected proportion of minority class is met.

SMOTE-Tomek Links

This is a modified version of SMOTE. It combines the ability of both SMOTE and Tomek Links. SMOTE is capable of generating synthetic data for minority class. And, Tomek Links is able to remove the data that are identified as Tomek links from the majority class.

from imblearn.combine import SMOTETomek
from imblearn.under_sampling import TomekLinks
# Define SMOTE-Tomek Links
resample=SMOTETomek(tomek=TomekLinks(sampling_strategy='majority'))
X, y = resample.fit_resample(X, y)

We can now visualize count plot to have an equal number of samples for each class in the target.

Splitting Data into Train and Test Data

The dataset can be split into training set and validation set. We will use the training set for training our model. Validation set will be used to check whether our model can perform well on new, unseen data.

from sklearn.model_selection import train_test_split

train_x, val_x,train_y, val_y = train_test_split(X,y,test_size=0.2)

Multiclass Classification using Random Forest Classifier

Random forest consists of a large number of single decision trees that work as an ensemble. Each individual tree in the random forest outputs a class prediction. Each class gets some votes and the class with the most votes becomes the model’s prediction.

from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=200)
rfc.fit(train_x, train_y)

rfc_predict = rfc.predict(val_x)
print('Accuracy score:',accuracy_score(val_y, rfc_predict))

Random Forest Classifier worked very well on the test dataset and gave an accuracy score of 88.62%. However, only knowing the accuracy score is not enough. In the next and final part, we will look at various evaluation metrics and calculate them using python in-built functions.

Evaluation Metrics

  • Accuracy: the proportion of the total number of predictions that were correct.
  • Precision: the proportion of positive cases that were correctly identified.
  • Sensitivity or Recall : the proportion of actual positive cases which are correctly identified.
  • F1 Score: The F1 score can be interpreted as a harmonic mean of the precision and recall

F1 Score = 2 * (precision * recall) / (precision + recall)

from sklearn.metrics import classification_report

print(classification_report(val_y,rfc_predict))

Annotation Converters for Object Detection

Introduction

Data labelling is an important task in Machine Learning. The quality of data we feed in the model will determine how well our model performs. Image annotation is the process of labelling images of a dataset for the machine learning model. It is used to label the features we need our model to recognize. In image annotation, the object is annotated and tagged with special techniques. This makes different type of object easily perceptible to AI-enabled machines.

Annotation work is usually carried out manually. While annotating, classes are predefined and features for the images are provided. The computer vision model is trained on these annotations. Now, it predicts the predetermined features on the new images which are not annotated.

Why Annotation is Important?

Computer vision models can learn a lot through annotated datasets. It can learn to predict accurately and relatively quicker. Therefore, it has its application in tasks like self-driving car, number-plate detection, tumor detection and many other remarkable applications.

The annotated datasets can provide our models the quality information. It can enable the model to learn well and predict well on new, unannotated data. With annotated images, the object detection can be easily performed. Thus, we rely heavily on these datasets to build AI-based models for automation.

Image Annotation for Object Detection

Image annotation refers to attaching labels (predetermined classes – human, dog, etc.) to an image. This is done to recognize, count, or segment objects boundaries in images. The annotations can have the following forms:

  1. Bounding boxes
  2. Semantic segmentation
  3. 3D Cuboids
  4. Polygons
  5. Lines & Splines

Image Annotation Formats

Computer vision problems require annotated data in their own defined formats. Some popular annotation formats are given below:

COCO

Microsoft COCO Dataset, a widely-used dataset. It has 2.5 million labeled instances for 80 object categories. COCO has total 5 annotation types

  • object detection
  • keypoint detection
  • stuff segmentation
  • panoptic segmentation
  • image captioning

The annotations are stored in the JSON form. The format for object detection is as follows:

annotation{

"id": int,

"image_id": int,

"category_id": int,

"segmentation": RLE or [polygon],

"area": float,

"boundingbox": [x,y,width,height],

"iscrowd": 0 or 1}

categories[{

"id": int,

"name": str,

"supercategory": str,

}]

YOLO

YOLO (You Only Look Once) is a very fast and accurate object detection algorithm. In this format, .txt file is generated with the same name for each image file in the same directory. Each .txt file contains the annotations for the corresponding image file. It consists of object class, object coordinates, height and width.

<object-class> <x> <y> <width> <height>

Each object is annotated on a new line. For two objects, given below is how they will be written in the .txt file-

0 67 33 23 14
1 54 19 86 78

Pascal VOC

Pascal VOC provides standardized image datasets for object detection. The annotation is stored in the XML file. Given below is an example of Pascal VOC annotation file for object detection:

<annotation> 
  <folder>Train</folder> 
  <filename>01.png</filename>      
  <path>/path/Train/01.png</path> 
  <source>  
    <database>Unknown</database> 
  </source>
  <size>  
    <width>224</width>  
    <height>224</height>  
    <depth>3</depth>   
  </size> 
  <segmented>0</segmented> 
  <object>  
    <name>36</name>  
    <pose>Frontal</pose>  
    <truncated>0</truncated>  
    <difficult>0</difficult>  
    <occluded>0</occluded>  
    <bndbox>   
      <xmin>90</xmin>   
      <xmax>190</xmax>   
      <ymin>54</ymin>   
      <ymax>70</ymax>  
    </bndbox> 
  </object>
</annotation>

TFRecord

A TFRecord (Tensorflow Record) file stores data in the form of sequence of binary strings. Tensorflow provides two components for specifying the structure of the data: tf.train.Example and tf.train.SequenceExample. Each sample of the data has to be stored in one of these structures. Then, it will have to be serialized using  tf.python_io.TFRecordWriter to write it to disk.

The process of reading TFRecord is given as follows:

  1. Use tf.TFRecordReader to read the TFRecord.
  2. Define the features expected in the TFRecord by using tf.FixedLenFeature and tf.VarLenFeature.
  3. Parse one tf.train.Example (one file) a time using tf.parse_single_example.

Annotation Converters (COCO to CSV, YOLO to COCO, etc.)

We often need to convert annotated data of one format to another. This is done to make use of the annotated dataset in a more versatile manner. Thus, with annotation converter functions, we can easily achieve conversions like COCO to CSV format, YOLO to COCO format, etc.

In the rest of the article, we will create different functions to enable format conversions. So, you can directly use these functions to perform format conversions on your own dataset.

Features are best represented in the form of rows and columns. So, we begin with conversions from different formats(COCO, YOLO, etc.) to CSV format. Thus, we can get a good understanding of the features, classes, bounding boxes, etc.

COCO to CSV format

def coco_to_csv(filename):

    import json

    # COCO2017/annotations/instances_val2017.json
    s = json.load(open(filename, 'r'))
    out_file = filename[:-5] + '.csv'
    out = open(out_file, 'w')
    #out.write('id,x1,y1,x2,y2,label\n')

    all_ids = []
    for im in s['images']:
        all_ids.append(im['id'])
    all_fn = []
    for im in s['images']:
        all_fn.append(im['file_name'])
    all_d = []
    for im in s['images']:
        all_d.append((im['height'],im['width']))

    classes=[]
    for cl in s['categories']:
        classes.append(cl['name'])

    all_ids_ann = []
    for ann in s['annotations']:
        image_id = ann['image_id']
        all_ids_ann.append(image_id)
        x1 = ann['bbox'][0]
        x2 =  ann['bbox'][2]-x1
        y1 = ann['bbox'][1]
        y2 =  ann['bbox'][3]-y1
        label = ann['category_id']
        out.write('{},{},{},{},{},{},{},{}\n'.format(classes[label], x1, y1, x2, y2,all_fn[image_id], all_d[image_id][1],all_d[image_id][0] ))

YOLO to CSV format

import os
import glob
import pandas as pd

def yolo_to_csv(yolo_dir,destination_dir):
    os.chdir(yolo_dir)
    myFiles = glob.glob('*.txt')
    classes=[]
    with open(yolo_dir+'/classes.names','rt') as f:
        for l in f.readlines():
            classes.append(l[:-1])

    width=1024
    height=1024
    image_id=0
    final_df=[]
    for item in myFiles:


        image_id+=1
        with open(item, 'rt') as fd:
            for line in fd.readlines():
                row = []
                bbox_temp = []
                splited = line.split()
                print(splited)
                try:
                    row.append(classes[int(splited[0])])

                    #print(row)

                    row.append(splited[1])
                    row.append(splited[2])
                    row.append(splited[3])
                    row.append(splited[4])
                    row.append(item[:-4]+".png")
                    row.append(width)
                    row.append(height)
                    final_df.append(row)

                except:
                    pass
    df = pd.DataFrame(final_df)
    df.to_csv(destination_dir+"/saved.csv",index=False)

Pascal VOC to CSV format

import glob
import pandas as pd
import xml.etree.ElementTree as ET

def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        print(xml_file)
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            bbx = member.find('bndbox')
            xmin = int(bbx.find('xmin').text)
            ymin = int(bbx.find('ymin').text)
            xmax = int(bbx.find('xmax').text)-xmin
            ymax = int(bbx.find('ymax').text)-ymin
            label = member.find('name').text
            value = (
                     label,
                     xmin,
                     ymin,
                     xmax,
                     ymax,
                     root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text)
                     )
            xml_list.append(value)

    xml_df = pd.DataFrame(xml_list )
    xml_df.to_csv(args.destination_dir + '\saved.csv', index=None, header=False)

TFRecord to CSV format

import tensorflow as tf
from PIL import Image

filenames = []
filenames.append('newtrain.record')

def read_tfrecord(serialized_example):
    feature_description = {
            'image/height': tf.io.FixedLenFeature((), tf.int64),
            'image/width': tf.io.FixedLenFeature((), tf.int64),
            'image/encoded': tf.io.FixedLenFeature((), tf.string),
            'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
            'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
            'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
            'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
            'image/object/class/text': tf.io.VarLenFeature(tf.string),
            'image/filename': tf.io.FixedLenFeature((),tf.string)
    }
    parsed_features = tf.io.parse_single_example(serialized_example, feature_description)
    parsed_features['image/encoded'] = tf.io.decode_jpeg(
            parsed_features['image/encoded'], channels=3)

    return parsed_features

data = tf.data.TFRecordDataset(filenames)
parsed_dataset = data.shuffle(128).map(read_tfrecord).batch(1)
print(parsed_dataset)
coord = []
filenames = []
labels = []
dim = []

for sample in parsed_dataset.take(10000):
    numpyed = sample['image/encoded'].numpy()
    alist = numpyed[0,:,:,:]
    for i in range(len(sample['image/object/bbox/xmin'].values.numpy())):
        coord.append([round(sample['image/object/bbox/xmin'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/width'])[0]),
                      round(sample['image/object/bbox/ymin'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/height'])[0]),
                      round(sample['image/object/bbox/xmax'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/width'])[0]),
                      round(sample['image/object/bbox/ymax'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/height'])[0])])
        filenames.append(str(sample['image/filename'].numpy()[0])[2:][:-1])
        dim.append([tf.keras.backend.get_value(sample['image/width'])[0],
                    tf.keras.backend.get_value(sample['image/height'])[0]])
    for i in list(sample['image/object/class/text'].values.numpy()):
        labels.append(str(i)[2:][:-1])
    img = Image.fromarray(alist, 'RGB')
    img.save('new.jpg')
out_file = 'file.csv'
out = open(out_file, 'w')

for i in range(len(coord)):
    x1= coord[i][0]
    y1= coord[i][1]
    x2 = coord[i][2]-x1
    y2 = coord[i][3]-y1
    out.write('{},{},{},{},{},{},{},{}\n'.format(labels[i], x1, y1, x2, y2,filenames[i],dim[i][0],dim[i][1]))

Now, we will look at functions for conversions from CSV to COCO, YOLO, VOC PASCAL and TFRecord formats.

CSV to COCO format

import json
import pandas as pd
def csv_to_coco(file_dir,destination_dir):
    path = file_dir
    save_json_path = destination_dir+'/traincoco.json'
    clmns = ['class','xmin','ymin','xmax','ymax','filename','width','height']
    data = pd.read_csv(path, names = clmns, header=None)
    images = []
    categories = []
    annotations = []
    data['fileid'] = data['filename'].astype('category').cat.codes
    data['categoryid']= pd.Categorical(data['class'],ordered= True).codes
    data['categoryid'] = data['categoryid']+1
    data['annid'] = data.index
    
    def image(row):
        image = {}
        image["height"] = row.height
        image["width"] = row.width
        image["id"] = row.fileid
        image["file_name"] = row.filename
        return image
    
    def category(row):
        category = {}
        category["supercategory"] = 'None'
        category["id"] = row.categoryid-1

        category["name"] = row[1]
        return category
    
    def annotation(row):
        annotation = {}
        area = (row.xmax)*(row.ymax)
        annotation["segmentation"] = []
        annotation["iscrowd"] = 0
        annotation["area"] = area
        annotation["image_id"] = row.fileid
        annotation["bbox"] = [row.xmin, row.ymin, row.xmax +row.xmin,row.ymax+row.ymin ]
        annotation["category_id"] = row.categoryid-1
        annotation["id"] = row.annid
        return annotation
    
    for row in data.itertuples():
        annotations.append(annotation(row))
    imagedf = data.drop_duplicates(subset=['fileid']).sort_values(by='fileid')
    for row in imagedf.itertuples():
        images.append(image(row))
    catdf = data.drop_duplicates(subset=['categoryid']).sort_values(by='categoryid')
    for row in catdf.itertuples():
        categories.append(category(row))

    data_coco = {}
    data_coco["images"] = images
    data_coco["categories"] = categories
    data_coco["annotations"] = annotations
    json.dump(data_coco, open(save_json_path, "w"), indent=4)

CSV to YOLO format

import csv
import numpy as np
import os

def csv_to_yolo(csv_file,destination_folder):
    if not os.path.exists(destination_folder+'\data'):
        os.makedirs(destination_folder+'\data')
        classes_names = []
    file_names = []
    data = csv.reader(open(csv_file))
    for l in data:
        file_names.append(l[5])
        classes_names.append(l[0])
    classes_names = np.unique(classes_names)
    classes = {k: v for v, k in enumerate(classes_names)}
    f=open(destination_folder+"/data/"+ 'classes.names','a')
    for i in classes_names:
        f.write(str(i))
        f.write('\n')
    f.close()
    for name in np.unique(file_names):
        file = open(destination_folder+'/data/'+str(name[:-4])+".txt",'a')
        for l in csv.reader(open(csv_file)):
            if(l[5]==name):
                file.write(str(classes[l[0]]))
                file.write(' ')
                file.write(l[1])
                file.write(' ')
                file.write(l[2])
                file.write(' ')
                file.write(l[3])
                file.write(' ')
                file.write(l[4])
                file.write(' ')
                file.write('\n')
    file.close()

CSV to Pascal VOC format

from collections import defaultdict
import os
import csv
from xml.etree.ElementTree import  Element, SubElement, ElementTree
 
def csv_to_voc_pascal(file_dir,save_root2):
    file_dir = 'here1.csv'
    save_root2 = save_root2 + "/result_xmls"
    if not os.path.exists(save_root2):
        os.mkdir(save_root2)

    def write_xml(folder, filename, bbox_list):
        root = Element('annotation')
        SubElement(root, 'folder').text = folder
        SubElement(root, 'filename').text = filename
        SubElement(root, 'path').text = './images' +  filename
        source = SubElement(root, 'source')
        SubElement(source, 'database').text = 'Unknown'

        # Details from first entry
        e_class_name, e_xmin, e_ymin, e_xmax, e_ymax, e_filename, e_width, e_height = bbox_list[0]
        size = SubElement(root, 'size')
        SubElement(size, 'width').text = e_width
        SubElement(size, 'height').text = e_height
        SubElement(size, 'depth').text = '3'
        SubElement(root, 'segmented').text = '0'

        for entry in bbox_list:
            e_class_name, e_xmin, e_ymin, e_xmax, e_ymax, e_filename, e_width, e_height = entry
            obj = SubElement(root, 'object')
            SubElement(obj, 'name').text = e_class_name
            SubElement(obj, 'pose').text = 'Unspecified'
            SubElement(obj, 'truncated').text = '0'
            SubElement(obj, 'difficult').text = '0'

            bbox = SubElement(obj, 'bndbox')
            SubElement(bbox, 'xmin').text = e_xmin
            SubElement(bbox, 'ymin').text = e_ymin
            SubElement(bbox, 'xmax').text = e_xmax
            SubElement(bbox, 'ymax').text = e_ymax

        #indent(root)
        tree = ElementTree(root)
        xml_filename = os.path.join('.', folder, os.path.splitext(filename)[0] + '.xml')
        tree.write(xml_filename)
    entries_by_filename = defaultdict(list)

    with open(file_dir, 'r', encoding='utf-8') as f_input_csv:
        csv_input = csv.reader(f_input_csv)
        header = next(csv_input)
        class_name, xmin, ymin, xmax, ymax, filename, width, height= header
        header[3]=str(int(header[1])+int(header[3]))
        header[4]=str(int(header[2])+int(header[4]))
        entries_by_filename[filename].append(header)
        for row in csv_input:
            class_name, xmin, ymin, xmax, ymax, filename, width, height= row
            row[3]=str(int(row[1])+int(row[3]))
            row[4]=str(int(row[2])+int(row[4]))
            #print(row)
            entries_by_filename[filename].append(row)
    for filename, entries in entries_by_filename.items():
        #print(filename, len(entries))
        write_xml(save_root2, filename, entries)

CSV to TFRecord format

from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import pandas as pd
import tensorflow as tf
from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

flags = tf.compat.v1.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('image_dir', '', 'Path to images')
FLAGS = flags.FLAGS

columns = ['class','xmin','ymin','xmax','ymax','filename','width','height']
data = pd.read_csv(FLAGS.csv_input,names = columns)
df_reorder = data[['filename','width','height','class','xmin','ymin','xmax','ymax']] # rearrange column here
df_reorder.to_csv('newcsv.csv', index=False)

#TO-DO
def class_text_to_int(row_label):
    if row_label == 'Man':
        return 1
    elif row_label == 'Dog':
        return 2
    elif row_label == 'Monitor':
        return 3
    elif row_label == 'Machine':
        return 4
    elif row_label == 'Girl':
        return 5
    else:
        None

def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]

def create_tf_example(group, path):
    i =0
    with tf.io.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append((row['xmax']+row['xmin']) / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append((row['ymax']+row['ymin'])/ height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example

def main(_):
    writer = tf.io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(FLAGS.image_dir)
    examples = pd.read_csv('newcsv.csv')
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())
    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))

To achieve any conversions, we can convert it to CSV format. Then, CSV format can be converted to the desired format. For instance, we need to convert COCO to Pascal VOC. We will first convert COCO to CSV format. Then, CSV will be converted to Pascal VOC format

In the last part of the article, some direct conversions functions are provided:

XML to JSON format

import os
import json
import xml.etree.ElementTree as ET
import glob

START_BOUNDING_BOX_ID = 1
PRE_DEFINE_CATEGORIES = None

def get(root, name):
    vars = root.findall(name)
    return vars

def get_and_check(root, name, length):
    vars = root.findall(name)
    if len(vars) == 0:
        raise ValueError("Can not find %s in %s." % (name, root.tag))
    if length > 0 and len(vars) != length:
        raise ValueError(
            "The size of %s is supposed to be %d, but is %d."
            % (name, length, len(vars))
        )
    if length == 1:
        vars = vars[0]
    return vars

def get_filename(filename):
        filename = filename.replace("\\", "/")
        filename = os.path.splitext(os.path.basename(filename))[0]
        return str(filename)
    
def get_categories(xml_files):
    """Generate category name to id mapping from a list of xml files.
        Arguments:
        xml_files {list} -- A list of xml file paths.
    Returns:
        dict -- category name to id mapping.
    """
    classes_names = []
    for xml_file in xml_files:
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall("object"):
            classes_names.append(member[0].text)
    classes_names = list(set(classes_names))
    classes_names.sort()
    return {name: i for i, name in enumerate(classes_names)}

def convert(xml_files, json_file):
    json_dict = {"images": [], "type": "instances", "annotations": [], "categories": []}
    if PRE_DEFINE_CATEGORIES is not None:
        categories = PRE_DEFINE_CATEGORIES
    else:
        categories = get_categories(xml_files)
    bnd_id = START_BOUNDING_BOX_ID
    for xml_file in xml_files:
        tree = ET.parse(xml_file)
        root = tree.getroot()
        path = get(root, "path")
        if len(path) == 1:
            filename = os.path.basename(path[0].text)
        elif len(path) == 0:
            filename = get_and_check(root, "filename", 1).text
        else:
            raise ValueError("%d paths found in %s" % (len(path), xml_file))
        ## The filename must be a number
        image_id = get_filename(filename)
        size = get_and_check(root, "size", 1)
        width = int(get_and_check(size, "width", 1).text)
        height = int(get_and_check(size, "height", 1).text)
        image = {
            "file_name": filename,
            "height": height,
            "width": width,
            "id": image_id,
        }
        json_dict["images"].append(image)

        for obj in get(root, "object"):
            category = get_and_check(obj, "name", 1).text
            if category not in categories:
                new_id = len(categories)
                categories[category] = new_id
            category_id = categories[category]
            bndbox = get_and_check(obj, "bndbox", 1)
            xmin = int(get_and_check(bndbox, "xmin", 1).text) - 1
            ymin = int(get_and_check(bndbox, "ymin", 1).text) - 1
            xmax = int(get_and_check(bndbox, "xmax", 1).text)
            ymax = int(get_and_check(bndbox, "ymax", 1).text)
            assert xmax > xmin
            assert ymax > ymin
            o_width = abs(xmax - xmin)
            o_height = abs(ymax - ymin)
            ann = {
                "area": o_width * o_height,
                "iscrowd": 0,
                "image_id": image_id,
                "bbox": [xmin, ymin, o_width, o_height],
                "category_id": category_id,
                "id": bnd_id,
                "ignore": 0,
                "segmentation": [],
            }
            json_dict["annotations"].append(ann)
            bnd_id = bnd_id + 1

    for cate, cid in categories.items():
        cat = {"supercategory": "none", "id": cid, "name": cate}
        json_dict["categories"].append(cat)

    os.makedirs(os.path.dirname(json_file), exist_ok=True)
    json_fp = open(json_file, "w")
    json_str = json.dumps(json_dict)
    json_fp.write(json_str)
    json_fp.close()

JSON to YOLO format

import json
classes = ["Man","Monitor","Dog"]

def convert(size,box):
    x = box[0]
    y = box[1]
    w = box[2]
    h = box[3]
    return (x,y,w,h)

def convert_annotation(json_dir,destination_dir):
    with open(json_dir,'r') as f:
        data = json.load(f)
    for item in data['images']:
        image_id = item['id']      
        file_name = item['file_name']
        width = item['width']
        height = item['height']
        value = filter(lambda item1: item1['image_id'] == image_id,data['annotations'])
        outfile = open(destination_dir+"%s.txt"%(file_name[:-4]), 'a+')
        for item2 in value:
            category_id = item2['category_id']
            value1 = list(filter(lambda item3: item3['id'] == category_id,data['categories']))
            name = value1[0]['name']
            class_id = classes.index(name)
            box = item2['bbox']
            bb = convert((width,height),box)
            outfile.write(str(class_id)+" "+" ".join([str(a) for a in bb]) + '\n')