Task 1
You are provided with a “Historical_Data.csv” from a company named ABC which sell products online.
The dataset (Historical data) contains sales record (on daily basis) from different countries.
Use Historical_Data.csv to build different time-series models for each Article ID. This file is located in res/Historical_Data.csv.
To perform the above exercise carry out the following tasks:
1. Print number of days which sold more than 3 units.
Hint:
01/08/2017 | IN | 2
01/08/2017 | FR | 3
Total sale for the day will be (3+2) = 5
2. Print sales of the country(FR) in the month of August.
3. Print total units sold in the country(AT).
Hint: Pre-process the date column to DateTime type
Final Output Sample:
result = [15, 90, 10]
#NOTE: Here the answer for the questions are in the following format:
#15 ---> Answer 1
#90 ---> Answer 2
#10 ---> Answer 3
result=pd.DataFrame(result)
#writing output to output.csv
result.to_csv('output/output.csv', header=False, index=False)
Output Format:
Perform the above operations and write your output to a file named output.csv, which should be present at the location output/output.csv
output.csv should contain the answer to each question on consecutive rows.
***Note: Write code only in solution() function and do not pass any additional arguments to the function. For predefined stub refer stub.py***
Note: This question will be evaluated based on the number of test cases that your code passes.
Question 2
You are provided with a “Historical_Data.csv” from a company named ABC which sells products online.
Historical data contains sales record (on daily basis ) from different countries
Use Historical_Data.csv to build different time-series models for each Article ID.
This CSV file is located in res/Historical_Data.csv.
You will observe that for some dates the sales were not made. Add 0 as ‘Sold_Units’ and ‘Article_ID’ for such dates.
Example: If sales for country ‘FR’ was made on 2017-03-02 and next sale on 2017-03-04 then, for 2017-03-03, country ‘FR’ fill 0 for ‘Sold_Units’ and ‘Article_ID’.
Hint: Group by Country_Code
a. Once data pre-processing is done, print the starting date of sale for ‘FR’.
Example: If the first sale of country ‘FR’ was on 2018-02-04 then the output(in the same format) should be:
Output: 2018-02-04
b. Print the number of non-selling days for the country('AT')
Example: Total non-selling days for AT is 150
Output: 150
Hint: Group by Country_Code
Final Output Sample
result = ['2018-02-04', 150]
#NOTE: Here the answer for the questions are in the following format:
#2018-02-04 ---> Answer 1
#150 ---> Answer 2
result=pd.DataFrame(result)
#writing output to output.csv
result.to_csv('output/output.csv', header=False, index=False)
Output Format:
Perform the above operations and write your output to a file named output.csv, which should be present at the location output/output.csv
output.csv should contain the answer to each question on consecutive rows.
***Note: Write code only in solution() function and do not pass any arguments to the function. For predefined stub refer stub.py***
Note: This question will be evaluated based on the number of test cases that your code passes.
Question 3
You are provided with a “Historical_Data.csv” from a company named ABC which sells products online.
Historical data contains sales record (on daily basis ) from different countries
Use Historical_Data.csv to build different time-series models for each Country Code.
This file is located in res/Historical_Data.csv.
***NOTE: Use Auto Arima for prediction.
For the above scenario,
Perform backtesting for each time-series on the last 10 values and report the Mean Absolute Error country-wise for each of the last 10 dates (up to 3 decimal places).
Hint: Group by Country_Code only.
Final Output Sample
result = [1.23,0.222,2.23,0.11]
#NOTE: Here the answer for the questions are in the following format:
#1.23 ---> Answer 1
#0.222---> Answer 2
#2.23---> Answer 3
#0.11---> Answer 4
result=pd.DataFrame(result)
#writing output to output.csv
result.to_csv('output/output.csv', header=False, index=False)
Steps to be followed:
1. Load the data from “Historical_Data.csv” by taking the date column as an index and sorting it by date column and save it in dataframe(df) 2. From the above dataframe(df) extract out the data for “Sold units” countrywise and store in some dataframe (say “df2”) using a for a loop. Here for each iteration it will extract out the sold units for each country, irrespective of “Article_id”. 3. Then on the extracted “df2” perform a splitting task. Take the last 10 rows to the 'test’ dataframe and rest of the above to “train” dataframe. (This is known as Backtesting)
4. Use ARIMA model on the above “train” and “test” dataframes and calculate the predictions and store it in a separate dataframe (say “Final_df”). In this “Final_df” we will store the “Sold_units” and “Predicted_units” for each country.
Syntax of Auto-Arima:
from pmdarima.arima import auto_arima
model=auto_arima(train,trace=True,suppress_warnings=True,error_action='ignore')
model.fit(train)
5. Then finally calculate the Mean Absolute Error by using the Sold Units and Predicted_units for the respective countries.
Hint: You can use “sklearn.metrics” package
NOTE: All the above tasks should be done in a loop wherein each iteration represents a particular country irrespective of Article Id.
Output Format:
Perform the above operations and write your output to a file named output.csv, which should be present at the location output/output.csv
output.csv should contain the answer to each question on consecutive rows.
***Note: Forecasting model might take some time to generate the output on the test dataset, request your patience in this interim.
***Note: Write code only in solution() function and do not pass any arguments to the function. For predefined stub refer stub.py. This question will be manually evaluated and score will be allotted accordingly***
Note: This question will be evaluated based on the number of test cases that your code passes.
Solution Q3
# importing the dataset into a pandas dataframe with Date column as index
data=pd.read_csv('Historical Dataset.csv',parse_dates=[0],index_col=['Date'])
data.sort_values(by='Date',ascending=True)
data.head()
Output:
from pmdarima.arima import auto_arima
from sklearn.metrics import mean_absolute_error
result=[]
for i in data.Country_Code.unique():
#creating data frame for each country
df=data[data['Country_Code']==i]
df=df.drop(['Article_ID','Country_Code'],axis=1)
#splitting dataframe into train and test
train = df.iloc[:len(df)-10] # all values except last 10 values
test = df.iloc[len(df)-10:] # last 10 values
#fitting the auto arima model with train set
model=auto_arima(train,trace=True,suppress_warnings=True,error_action='ignore')
model.fit(train)
#predicting the test set using the auto arima model
pred=model.predict(n_periods=10)
#creating a new dataframe called Final_df with coulmns=Sold_Units (from test set) and Predicted_Units
Final_df=test.copy()
Final_df['Predicted_Units']=pred
#Caculating the mean absolute error and storing it in a list called result. mae=round(mean_absolute_error(Final_df['Sold_Units'],Final_df['Predicted_Units']),3)
result.append(mae)
Output:
Performing stepwise search to minimize aic ARIMA(2,0,2)(0,0,0)[0] intercept : AIC=2401.738, Time=0.65 sec ARIMA(0,0,0)(0,0,0)[0] intercept : AIC=2417.224, Time=0.02 sec ARIMA(1,0,0)(0,0,0)[0] intercept : AIC=2408.044, Time=0.10 sec ARIMA(0,0,1)(0,0,0)[0] intercept : AIC=2409.615, Time=0.07 sec ARIMA(0,0,0)(0,0,0)[0] : AIC=3287.303, Time=0.02 sec ARIMA(1,0,2)(0,0,0)[0] intercept : AIC=2399.937, Time=0.68 sec ARIMA(0,0,2)(0,0,0)[0] intercept : AIC=2407.196, Time=0.10 sec ARIMA(1,0,1)(0,0,0)[0] intercept : AIC=2398.082, Time=0.34 sec ARIMA(2,0,1)(0,0,0)[0] intercept : AIC=2399.948, Time=0.44 sec ARIMA(2,0,0)(0,0,0)[0] intercept : AIC=2404.387, Time=0.13 sec ARIMA(1,0,1)(0,0,0)[0] : AIC=2426.332, Time=0.25 sec Best model: ARIMA(1,0,1)(0,0,0)[0] intercept Total fit time: 2.790 seconds Performing stepwise search to minimize aic ARIMA(2,1,2)(0,0,0)[0] intercept : AIC=3370.753, Time=1.21 sec ARIMA(0,1,0)(0,0,0)[0] intercept : AIC=3876.855, Time=0.02 sec ARIMA(1,1,0)(0,0,0)[0] intercept : AIC=3645.298, Time=0.07 sec ARIMA(0,1,1)(0,0,0)[0] intercept : AIC=3402.361, Time=0.19 sec ARIMA(0,1,0)(0,0,0)[0] : AIC=3874.855, Time=0.02 sec ARIMA(1,1,2)(0,0,0)[0] intercept : AIC=3372.991, Time=1.08 sec ARIMA(2,1,1)(0,0,0)[0] intercept : AIC=3374.701, Time=0.53 sec ARIMA(3,1,2)(0,0,0)[0] intercept : AIC=3376.763, Time=0.93 sec ARIMA(2,1,3)(0,0,0)[0] intercept : AIC=3376.590, Time=1.15 sec ARIMA(1,1,1)(0,0,0)[0] intercept : AIC=3377.452, Time=0.28 sec ARIMA(1,1,3)(0,0,0)[0] intercept : AIC=3370.539, Time=1.42 sec ARIMA(0,1,3)(0,0,0)[0] intercept : AIC=3376.792, Time=0.50 sec ARIMA(1,1,4)(0,0,0)[0] intercept : AIC=3376.824, Time=0.97 sec ARIMA(0,1,2)(0,0,0)[0] intercept : AIC=3381.419, Time=0.28 sec ARIMA(0,1,4)(0,0,0)[0] intercept : AIC=3375.781, Time=1.30 sec ARIMA(2,1,4)(0,0,0)[0] intercept : AIC=3378.611, Time=1.39 sec ARIMA(1,1,3)(0,0,0)[0] : AIC=inf, Time=0.73 sec Best model: ARIMA(1,1,3)(0,0,0)[0] intercept Total fit time: 12.108 seconds Performing stepwise search to minimize aic ARIMA(2,1,2)(0,0,0)[0] intercept : AIC=5427.314, Time=1.41 sec ARIMA(0,1,0)(0,0,0)[0] intercept : AIC=6017.277, Time=0.03 sec ARIMA(1,1,0)(0,0,0)[0] intercept : AIC=5755.706, Time=0.08 sec ARIMA(0,1,1)(0,0,0)[0] intercept : AIC=5455.711, Time=0.15 sec ARIMA(0,1,0)(0,0,0)[0] : AIC=6015.277, Time=0.03 sec ARIMA(1,1,2)(0,0,0)[0] intercept : AIC=5430.600, Time=1.10 sec ARIMA(2,1,1)(0,0,0)[0] intercept : AIC=5436.272, Time=0.36 sec ARIMA(3,1,2)(0,0,0)[0] intercept : AIC=5438.168, Time=1.10 sec ARIMA(2,1,3)(0,0,0)[0] intercept : AIC=inf, Time=1.77 sec ARIMA(1,1,1)(0,0,0)[0] intercept : AIC=5435.113, Time=0.26 sec ARIMA(1,1,3)(0,0,0)[0] intercept : AIC=5428.331, Time=2.20 sec ARIMA(3,1,1)(0,0,0)[0] intercept : AIC=5436.215, Time=0.65 sec ARIMA(3,1,3)(0,0,0)[0] intercept : AIC=5438.165, Time=2.03 sec ARIMA(2,1,2)(0,0,0)[0] : AIC=inf, Time=0.85 sec Best model: ARIMA(2,1,2)(0,0,0)[0] intercept Total fit time: 12.013 seconds Performing stepwise search to minimize aic ARIMA(2,1,2)(0,0,0)[0] intercept : AIC=8224.087, Time=0.73 sec ARIMA(0,1,0)(0,0,0)[0] intercept : AIC=8791.396, Time=0.04 sec ARIMA(1,1,0)(0,0,0)[0] intercept : AIC=8517.606, Time=0.10 sec ARIMA(0,1,1)(0,0,0)[0] intercept : AIC=8242.394, Time=0.19 sec ARIMA(0,1,0)(0,0,0)[0] : AIC=8789.396, Time=0.03 sec ARIMA(1,1,2)(0,0,0)[0] intercept : AIC=8222.432, Time=0.50 sec ARIMA(0,1,2)(0,0,0)[0] intercept : AIC=8221.557, Time=0.26 sec ARIMA(0,1,3)(0,0,0)[0] intercept : AIC=8223.256, Time=0.42 sec ARIMA(1,1,1)(0,0,0)[0] intercept : AIC=8221.613, Time=0.30 sec ARIMA(1,1,3)(0,0,0)[0] intercept : AIC=8224.349, Time=0.68 sec ARIMA(0,1,2)(0,0,0)[0] : AIC=8219.557, Time=0.12 sec ARIMA(0,1,1)(0,0,0)[0] : AIC=8240.395, Time=0.07 sec ARIMA(1,1,2)(0,0,0)[0] : AIC=8220.432, Time=0.24 sec ARIMA(0,1,3)(0,0,0)[0] : AIC=8221.256, Time=0.18 sec ARIMA(1,1,1)(0,0,0)[0] : AIC=8219.614, Time=0.14 sec ARIMA(1,1,3)(0,0,0)[0] : AIC=8222.350, Time=0.43 sec Best model: ARIMA(0,1,2)(0,0,0)[0] Total fit time: 4.451 seconds
result
Output:
[1.32, 0.506, 1.657, 0.417]
Save Output into csv file:
result=pd.DataFrame(result)
#writing output to output.csv
result.to_csv('output/output.csv', header=False, index=False)
If you need any programming assignment help in machine Learning, machine Learning project or machine Learning homework or need Complete solution of above problem then we are ready to help you.
Send your request at realcode4you@gmail.com and get instant help with an affordable price.
We are always focus to delivered unique or without plagiarism code which is written by our highly educated professional which provide well structured code within your given time frame.
If you are looking other programming language help like C, C++, Java, Python, PHP, Asp.Net, NodeJs, ReactJs, etc. with the different types of databases like MySQL, MongoDB, SQL Server, Oracle, etc. then also contact us.
Comentários