Autonomous Driving

Problem Statement : Can you predict vehicle angle in different settings?

Link: https://www.kaggle.com/c/pku-autonomous-driving/overview

The training set consists of over 4000 images of the street taken by a camera attached to the top of a car. These images have various other cars in them (there can be none or many). We have information on the following pose information of each of these cars in the images:

model_type, yaw, pitch, roll, x, y, z

We are also provided with the camera intrinsics to convert camera coordinates to image coordinates. Some cars might be too far off so we've been provided with masks to get rid of insignificant cars (both in test and train data). Additionally, we're provided with 3D models of each car type (which we may not need to use!).

Our target is to predict the following pose information for each of the test images (Note we don't need to predict model_type):

yaw, pitch, roll, x, y, z, confidence in prediction

Library imports

import numpy as np 
import pandas as pd 
import cv2
from tqdm import tqdm
import matplotlib.pyplot as plt
import seaborn as sns
from functools import reduce
import os
from scipy.optimize import minimize
import plotly.express as px
import matplotlib.image as mpimg

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data import Dataset, DataLoader
from torchvision import models
from torchvision import transforms, utils

PATH = '../../auto_driving/'
os.listdir(PATH)

['test_images',
 '.DS_Store',
 'car_models_json',
 'camera',
 'car_models',
 'train.csv',
 'train_images',
 'test_masks',
 'pku-autonomous-driving.zip',
 'train_masks',
 'jupyter',
 'sample_submission.csv']

Load Data

train.csv consists of the pose information for each image for each car present in that image

We also load the camera intrinsic parameters to be able to use it later for coord conversions

train = pd.read_csv(PATH + 'train.csv')
test = pd.read_csv(PATH + 'sample_submission.csv')

# From camera.zip
camera_matrix = np.array([[2304.5479, 0,  1686.2379],
                          [0, 2305.8757, 1354.9849],
                          [0, 0, 1]], dtype=np.float32)
train.head()