Requirements
Domain: Social Media
Project: Analysis of Instagram posts
The dataset, sourced from Kaggle, is a compilation of details about Instagram posts, for posts posted between 04/05/2012, 2:36 PM (UTC) and 04/27/2020, 3:34 PM (UTC), by a set of over 1.04 billion Instagram users. The project aims to glean insights about factors that impact engagement of posts on Instagram. The dataset captures details such as number of comments received for a post, number of likes received for a post, timestamp for each post, number of followers each user has, number of Instagram handles each user follows, gender of each Instagram user, number of total posts each user has posted, and more.
The project aims to understand the dataset and explore the following:
1. Which Instagram ID has the highest number of followers?
2. Which Instagram ID has received the highest number of comments?
3. Which Instagram ID has the highest number of likes overall?
4. Which Instagram ID has posted the greatest number of posts?
5. Relationship between gender and number of followers. Do males have a greater number of followers on an average?
6. Do males get a greater number of likes for their posts on an average?
7. Do females get a greater number of comments on an average? That is, are females better at conversations in Instagram interactions?
8. Which topic category is the most popular?
9. Do Instagrammers with a greater number of posts also have more followers?
10. Which format do Instagrammers “like” the most? GraphImage, GraphSidecar or GraphVideo?
11. Is there an hour of the day when Instagram posts receive high engagement?
12. Is there an hour of the day when Instagram posts receive high engagement?
Analysis of the Instagram dataset can help a brand looking to build Instagram presence identify the best time of the day to post on Instagram, the format that is a hit in Instagram posts, the type of posts that could help increase engagement, and more. Apart from this, the analysis would also help establish a correlation between gender and Instagram presence
Solution
First Need to import all related Libraries
#import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_palette("dark")
# importing the first 20,000 rows into a pandas dataframe due to large file
ig_df=pd.read_csv('ig_all.csv',nrows=20000)
ig_df.head()
Output
ig_df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 _id 20000 non-null int64
1 content 19650 non-null object
2 display_url 20000 non-null object
3 num_comment 20000 non-null float64
4 num_like 20000 non-null float64
5 post_type 20000 non-null object
6 shortcode 20000 non-null object
7 taken_at_timestamp 20000 non-null float64
8 topic 12948 non-null object
9 user_id 20000 non-null float64
10 video_view_count 2283 non-null float64
11 num_follower 20000 non-null float64
12 num_following 20000 non-null float64
13 num_post 20000 non-null float64
14 gender 20000 non-null object
dtypes: float64(8), int64(1), object(6)
memory usage: 2.3+ MB
# filling missing numerical values with zeroes
ig_df['video_view_count']= ig_df['video_view_count'].fillna(0)
ig_df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 _id 20000 non-null int64
1 content 19650 non-null object
2 display_url 20000 non-null object
3 num_comment 20000 non-null float64
4 num_like 20000 non-null float64
5 post_type 20000 non-null object
6 shortcode 20000 non-null object
7 taken_at_timestamp 20000 non-null float64
8 topic 12948 non-null object
9 user_id 20000 non-null float64
10 video_view_count 20000 non-null float64
11 num_follower 20000 non-null float64
12 num_following 20000 non-null float64
13 num_post 20000 non-null float64
14 gender 20000 non-null object
dtypes: float64(8), int64(1), object(6)
memory usage: 2.3+ MB
ig_df.head()
Output:
Question 1 Solution
# grouping the dataframe by _id and then summing over the num_follower for each id.
max_follower=ig_df.groupby('_id')['num_follower'].agg(['sum'])
max_follower.head()
Output:
# ids with maximum followers
id_max_follower=max_follower[max_follower['sum']==max_follower['sum'].max()].index.values
print('max number of followers=',max_follower['sum'].max(),'\n')
print('Ids with the max followers is',id_max_follower)
Output:
max number of followers= 26280948.0
Ids with the max followers is [2053657551225881312 2053811981908728430 2054544082723514844
2055157062351844210 2055509581288695214 2055793451858807002
2057960418644557485 2059109082154734461 2059995089230153281
2060134066545331768 2060704945805256020 2060710363134736834
2274745455451446849 2276719918892009433 2282843945801026492
2283190016439933203 2287037339771303750 2287756184320231336
2289293448963859570 2290479338171950095 2292144044351738133
2293453619617278217 2294276992521979098 2295542011855270358]
Question 2 Solution
# grouping the dataframe by _id and then summing over the num_comment for each id.
max_comments=ig_df.groupby('_id')['num_comment'].agg(['sum'])
max_comments.head()
Output:
# ids with maximum comments
id_max_comment=max_comments[max_comments['sum']==max_comments['sum'].max()].index.values
print('max number of comments=',max_comments['sum'].max(),'\n')
print('Ids with the max comments is',id_max_comment)
Output:
max number of comments= 96623.0
Ids with the max comments is [2199148278017001460]
Question 3 Solution
# grouping the dataframe by _id and then summing over the num_likes for each id.
max_likes=ig_df.groupby('_id')['num_like'].agg(['sum'])
max_likes.head()
Output:
# ids with maximum likes
id_max_like=max_likes[max_likes['sum']==max_likes['sum'].max()].index.values
print('max number of likes=',max_likes['sum'].max(),'\n')
print('Ids with the max likes is',id_max_like)
Output:
max number of likes= 335208.0
Ids with the max likes is [2283190016439933203]
Question 4 Solution
# grouping the dataframe by _id and then summing over the num_likes for each id.
max_posts=ig_df.groupby('_id')['num_post'].agg(['sum'])
max_posts.head()
Output:
# ids with maximum posts
id_max_post=max_posts[max_posts['sum']==max_posts['sum'].max()].index.values
print('max number of posts=',max_posts['sum'].max(),'\n')
print('Ids with the max posts is',id_max_post)
Output:
max number of posts= 17816.0
Ids with the max posts is [2023489855060727450 2023671756153268292 2024270956981181438
2024311403116147980 2024850437651330901 2024865985592337534
2025870536474678106 2025911065145175037 2025963768353305945
2026994137861035377 2027066606592533756 2027521861734282390
2027983787740160545 2028002072733126618 2028192584597270997
2028519915136361379 2028990819444205941 2029061363728407151
2029207970096707065 2029479002128848974 2029609024504333563
2030327958853607288 2031020603489694743 2031154650962703718
2031405266297202794 2031717130558319719 2031835376628352077
2031842729285311621 2032447283370419721 2032717147154732449
2033202974343538364 2033317668777611929 2033479262585097801
2033809791750749527 2033986828155105764 2034027119310306594
2034053221546306089 2034600096888366149 2034665040803724121
2034670475312915993 2035422352908737783 2036077423501690128
2036405851689900953 2036813955573230113 2036878208997054542
2036900320168254012 2036955508535064590 2037646181873735056
...
...
Question 5 Solution
ig_df.groupby('gender')['num_follower'].agg(['mean']).plot.bar(figsize=(13,6))
plt.show()
Output:
If you need any help in Data Analysis, Data Visualization which is related to machine learning then you can contact Us:
Send your request at realcode4you@gmail.com and get instant help with an affordable price.
We are always focus to delivered unique or without plagiarism code which is written by our highly educated professional which provide well structured code within your given time frame.
If you are looking other programming language help like C, C++, Java, Python, PHP, Asp.Net, NodeJs, ReactJs, etc. with the different types of databases like MySQL, MongoDB, SQL Server, Oracle, etc. then also contact us.
टिप्पणियां