Tagup Data Science Exercise
ExampleCo, Inc is gathering several types of data for its fleet of very expensive machines. These very expensive machines have three operating modes: normal, faulty and failed. The machines run all the time, and usually they are in normal mode. However, in the event that the machine enters faulty mode, the company would like to be aware of this as soon as possible. This way they can take preventative action to avoid entering failed mode and hopefully save themselves lots of money.
They collect four kinds of timeseries data for each machine in their fleet of very expensive machines. When a machine is operating in normal mode the data behaves in a fairly predictable way, but with a moderate amount of noise. Before a machine fails it will ramp into faulty mode, during which the data appears visibly quite different. Finally, when a machine fails it enters a third, and distinctly different, failed mode where all signals are very close to 0.
You can download the data here: exampleco_data
Your main objective: to develop an automated method to pinpoint the times of fault and failure in this machine. Keep in mind that you will be sharing these results with the executives at ExampleCo, so to the best of your ability, try to explain what you are doing, what you've shown, and why you think your predictions are good.
A few notes to help:
- A good place to start is by addressing the noise due to communication
errors. - Feel free to use any libraries you like. Your final results should be
presented in this Python notebook. - There are no constraints on the techniques you bring to bear, we are curious
to see how you think and what sort of resources you have in your toolbox. - Be sure to clearly articulate what you did, why you did it, and how the
results should be interpreted. In particular you should be aware of the
limitations of whatever approach or approaches you take. - Don't feel compelled to use all the data if you're not sure how. Feel free
to focus on data from a single unit if that makes it easier to get started. - Don't hesitate to reach out to datasciencejobs@tagup.io with any questions!
# To help you get started...
from IPython.display import display
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
data = pd.read_csv('../input/challenge/exampleco_data/machine_9.csv',index_col=0)
plt.plot(range(len(data)), data)
plt.show()
# Import required packages\
from matplotlib.dates import DateFormatter,YearLocator,MonthLocator
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
import copy
import seaborn as sns
from scipy import stats
import matplotlib.dates as mdates
Exploring the data
Let's focus on Machine number 9 for now. I have selected the machine randomly. Let's have a look at the distribution of the signals from machine 9.