Aims
The aim of this assignment is to introduce a practical application of Big Data and Cloud Computing using a realistic big data problem. Students will implement a solution using an industry leading Cloud computing provider together with the distributed processing environment Apache Spark. This will involve the selection of problem appropriate Machine Learning algorithms and methods.
Learning Outcomes Assessed:
Knowledge & Understanding
LO 1. Apply big data analytic algorithms, including those for visualization and cloud computing techniques to multi-terabyte datasets.
LO 2. Critically assess data analytic and machine learning algorithms to identify those that satisfy given big data problem requirements
Intellectual / Professional skills & abilities:
LO 3. Critically evaluate and select appropriate big data analytic algorithms to solve a given problem, considering the processing time available and other aspects of the problem.
LO 4. Design and develop advanced big data applications that integrate with third party cloud computing services
Personal Values Attributes (Global / Cultural awareness, Ethics, Curiosity) (PVA):
LO 5. Critically assess the relationship between knowledge and the ethical and social interpretation of primary research using big data.
Definitions
Portfolio Assignment: A collection of pieces of work
Individual Work: Work carried out by one person only
Group Work: Work carried out collaboratively seeking to improve each other’s elements
Peer Review: Critical analysis and subsequent grading of a social equal’s work
Semi-Formative: Training tasks assigned course credit to reward and ensure engagement.
Big Data Product: Weapons and Drugs (Individual Work 70%)
In the television documentary “Ross Kemp and the Armed Police” broadcast 6th September 2018 by ITV, multiple claims were made regarding violent crime in the UK.
These claims were:
1. Violent Crime is increasing
2. There are more firearms incidents per head in Birmingham than anywhere else in the UK
3. Crimes involving firearms are closely associated with drugs offences
In this assignment you will investigate these claims using real, publicly available data sets that will be made available to you and placed in Amazon S3. These include, but are not limited to:
Street Level Crime Data published by the UK Home Office. This dataset contains 19 million data rows giving a crime type, together with their location as a latitude and longitude.
Land Registry Price Paid Data: This gives the postcode of a property, the property type from an enumeration of D(Detached), S(Semi-Detached), T(Terraced), F (Flats/Maisonettes) and the price paid.
Postcode Data: This data set is based on material provided by the Ordinance Survey. It gives a latitude and longitude to every postcode. This is useful as it relates between the Land Registry Price Paid dataset postcode, and the original crime dataset
latitude/longitude.
Specifics
Process the data prepared for you using Apache Spark.
Filter the dataset so that crimes refer to relevant events only.
Using appropriate visualization methods, statistics, and machine learning, determine whether the claims made by Ross Kemp were true, false, or could not be determined.
Explain the reasoning behind your code so that it is clear what each block is intended to achieve, and why.
Report critically on the advantages, disadvantages, and limitations of the methods used.
Your submission will be a Jupyter Notebook containing both code (typically Python), and explanatory text (Markdown) limited to 2500 words (plus references). References from scientific literature must be used and your discussion must be your own words. DO NOT CUT AND PASTE FROM THE INTERNET.
Contact us to get Big Data Assignment Help, Big Data Homework Help, or need other Big Data related Help then you can contact us or send your requirement directly at realcode4you@gmail.com and get instant help with an affordable price.
Kommentare