Dataset - Supermarket
(size: 1000 < x < 10000)
Software Tools: Tableau
Requirements:
You are required to analyse a large data set of your choice, which has been agreed with your module tutor:
Your project may use any combination of data analysis techniques, data-mining algorithms and software that has been covered in the module. You may also apply them to any aspect(s) of the dataset for knowledge discovery.
❖ Data Analysis and Visualisation
⮚ Initial analysis of the data using visualisation techniques within Tableau (use diagrams/graphs to highlight important patterns/findings).
⮚ Discussion and interpretation of result.
⮚ Discussion of overall trends and patterns observed.
❖ Selection of Data Mining Algorithm
⮚ Select one data mining algorithm suitable for further analysis of your data.
⮚ Clearly justify your choice, with reference to the visualisation analysis carried out.
❖ Data Pre-processing
⮚ Identify your input and class variables, if relevant (i.e. which variable are you going to consider for your class variables).
Data Set Features:
Invoice ID
Branch
City
Customer type
Gender
Product line
Unit price
Quantity
Tax 5%
Total
Date
Time
Payment
cogs
gross margin percentage
gross income
Rating
Identify and resolve any anomalies in the data (i.e.missing values, outliers etc.).
Data Anomalies identification / rectification
Missing value (Blank)
Value not in allowable range (range is “Male”, “Female”), the value is now “M”
Value outside outlier (usually, a value higher/lower than 1.5 x IQR)
Upper outlier = + Stddev * 1.5
Lower outlier = - Stddev * 1.5
⮚ Carry out any appropriate pre-processing/transformations to the data set.
Outlier Detection
Youtube video : https://www.youtube.com/watch?v=9aDHbRb4Bf8
Example here:
1st Quartile = Quartile (A2:A11,1) = 1
3rd Quartile = Quartile (A2:A11,3) = 14.25
Lower Quartile = 11 - 1.5 * (14.25 - 11) = 6.125
Upper Quartile = 14.25 + 1.5 * (14.25 - 11) = 19.125
So outliers are
3, 99.
❖ Data Mining
⮚ Use the chosen data mining algorithm for further analysis of your pre-processed data set.
⮚ Clearly discuss the implementation of the data mining algorithm.
⮚ Discuss and interpret the results.
❖ Data Ethics
⮚ A discussion of data ethical issues related to the analysis and use of business data.
240 words
What is / are the concerns?
● Data privacy?
♦ What is data privacy?
♦ What data under your case need concern?
♦ Ethical concern of data privacy
⮚ Discrimination
⮚ Retention
⮚ Transparency
♦ Legal concern
⮚ Compliance of GDPR requirements
⮚ Seven Principles of GDPR
⮚ Professional Concern ( Standards / practices to follow)
▪ Digital analytics association (https://www.digitalanalyticsassociation.org/) Provides training / certification services to public to enhance personal data security.
▪ Data Science Code of Professional Conduct: http://www.datascienceassn.org/code-of-conduct.html).
▪ Privacy & data protection by design
● The functional requirement of systems should allow users customise:
♦ Privacy settings
♦ Levels of acceptable recording, monitoring and tracking
♦ Levels of security
● FOSS and open algorithm
▪ Data ethics framework
* Seven principles
Contact Us to get help in any tableau related assignment Help, project help and homework Help:
realcode4you@gmail.com
Comentarios