Learn practical skills, build real-world projects, and advance your career

Tagup Data Science Exercise

ExampleCo, Inc is gathering several types of data for its fleet of very expensive machines. These very expensive machines have three operating modes: normal, faulty and failed. The machines run all the time, and usually they are in normal mode. However, in the event that the machine enters faulty mode, the company would like to be aware of this as soon as possible. This way they can take preventative action to avoid entering failed mode and hopefully save themselves lots of money.

They collect four kinds of timeseries data for each machine in their fleet of very expensive machines. When a machine is operating in normal mode the data behaves in a fairly predictable way, but with a moderate amount of noise. Before a machine fails it will ramp into faulty mode, during which the data appears visibly quite different. Finally, when a machine fails it enters a third, and distinctly different, failed mode where all signals are very close to 0.

You can download the data here: exampleco_data

Your main objective: to develop an automated method to pinpoint the times of fault and failure in this machine. Keep in mind that you will be sharing these results with the executives at ExampleCo, so to the best of your ability, try to explain what you are doing, what you've shown, and why you think your predictions are good.

A few notes to help:

  1. A good place to start is by addressing the noise due to communication
    errors.
  2. Feel free to use any libraries you like. Your final results should be
    presented in this Python notebook.
  3. There are no constraints on the techniques you bring to bear, we are curious
    to see how you think and what sort of resources you have in your toolbox.
  4. Be sure to clearly articulate what you did, why you did it, and how the
    results should be interpreted. In particular you should be aware of the
    limitations of whatever approach or approaches you take.
  5. Don't feel compelled to use all the data if you're not sure how. Feel free
    to focus on data from a single unit if that makes it easier to get started.
  6. Don't hesitate to reach out to datasciencejobs@tagup.io with any questions!
# To help you get started...
from IPython.display import display
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 
import numpy as np


data = pd.read_csv('../input/challenge/exampleco_data/machine_9.csv',index_col=0)

plt.plot(range(len(data)), data)
plt.show()
Notebook Image
# Import required packages\
from matplotlib.dates import DateFormatter,YearLocator,MonthLocator
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
import copy
import seaborn as sns
from scipy import stats
import matplotlib.dates as mdates

Exploring the data

Let's focus on Machine number 9 for now. I have selected the machine randomly. Let's have a look at the distribution of the signals from machine 9.