Domain: Social Media
Project: Analysis of Instagram posts
The dataset captures details such as number of comments received for a post, number of likes received for a post, timestamp for each post, number of followers each user has, number of Instagram handles each user follows, gender of each Instagram user, number of total posts each user has posted, and more.
The project aims to understand the dataset and explore the following:
Which Instagram ID has the highest number of followers?
Which Instagram ID has received the highest number of comments?
Which Instagram ID has the highest number of likes overall?
Which Instagram ID has posted the greatest number of posts?
Relationship between gender and number of followers. Do males have a greater number of followers on an average?
Do males get a greater number of likes for their posts on an average?
Do females get a greater number of comments on an average? That is, are females better at conversations in Instagram interactions?
Which topic category is the most popular?
Do Instagrammers with a greater number of posts also have more followers?
Which format do Instagrammers “like” the most? GraphImage, GraphSidecar or GraphVideo?
Is there an hour of the day when Instagram posts receive high engagement?
Is there an hour of the day when Instagram posts receive high engagement?
Analysis of the Instagram dataset can help a brand looking to build Instagram presence identify the best time of the day to post on Instagram, the format that is a hit in Instagram posts, the type of posts that could help increase engagement, and more. Apart from this, the analysis would also help establish a correlation between gender and Instagram presence.
Code Implementation
Import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_palette("dark")
Read Dataset
# importing the first 20,000 rows into a pandas dataframe
ig_df=pd.read_csv('ig_all.csv',nrows=20000)
ig_df.head()
Worked with only 20000 instances of the dataset as it was a very large file. Remove the nrows argument to work with the full dataset.
ig_df.info()
output:
...
...
# filling missing numerical values with zeroes
ig_df['video_view_count']= ig_df['video_view_count'].fillna(0)
Question 1
# grouping the dataframe by _id and then summing over the num_follower for each id.
max_follower=ig_df.groupby('_id')['num_follower'].agg(['sum'])
max_follower.head()
output:
# ids with maximum followers
id_max_follower=max_follower[max_follower['sum']==max_follower['sum'].max()].index.values
print('max number of followers=',max_follower['sum'].max(),'\n')
print('Ids with the max followers is',id_max_follower)
output:
max number of followers= 26280948.0
Ids with the max followers is [2053657551225881312 2053811981908728430 2054544082723514844
2055157062351844210 2055509581288695214 2055793451858807002
2057960418644557485 2059109082154734461 2059995089230153281
2060134066545331768 2060704945805256020 2060710363134736834
2274745455451446849 2276719918892009433 2282843945801026492
2283190016439933203 2287037339771303750 2287756184320231336
2289293448963859570 2290479338171950095 2292144044351738133
2293453619617278217 2294276992521979098 2295542011855270358]
Question 2
# grouping the dataframe by _id and then summing over the num_comment for each id.
max_comments=ig_df.groupby('_id')['num_comment'].agg(['sum'])
max_comments.head()
output:
# ids with maximum comments
id_max_comment=max_comments[max_comments['sum']==max_comments['sum'].max()].index.values
print('max number of comments=',max_comments['sum'].max(),'\n')
print('Ids with the max comments is',id_max_comment)
output:
max number of comments= 96623.0
Ids with the max comments is [2199148278017001460]
Contact us or send your request at below mail id to get any help related to data analysis or need complete solution of above task at:
realcode4you@gmail.com
Comments