Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from keras.models import Sequential
from keras.layers import Dense, Flatten, LSTM, Conv1D, MaxPooling1D, Dropout, Activation,Embedding
from sklearn.preprocessing import MinMaxScaler
Read Data
Dataset You can download from here
d = pd.read_csv('/brca_metabric_clinical_data.tsv',sep='\t')
d.dropna(inplace=True)
d.head()
Output
d['Patient\'s Vital Status'].unique()
Output
array(['Living', 'Died of Disease', 'Died of Other Causes'], dtype=object)
#
y = d['Patient\'s Vital Status']
le = preprocessing.LabelEncoder()
le.fit(y)
y = le.transform(y)
le.classes_
Output
array(['Died of Disease', 'Died of Other Causes', 'Living'], dtype=object)
d.info()
Output:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1092 entries, 1 to 1743
Data columns (total 38 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Study ID 1092 non-null object
1 Patient ID 1092 non-null object
2 Sample ID 1092 non-null object
3 Age at Diagnosis 1092 non-null float64
4 Type of Breast Surgery 1092 non-null object
5 Cancer Type 1092 non-null object
6 Cancer Type Detailed 1092 non-null object
7 Cellularity 1092 non-null object
8 Chemotherapy 1092 non-null object
9 Pam50 + Claudin-low subtype 1092 non-null object
10 Cohort 1092 non-null float64
11 ER status measured by IHC 1092 non-null object
12 ER Status 1092 non-null object
13 Neoplasm Histologic Grade 1092 non-null float64
14 HER2 status measured by SNP6 1092 non-null object
15 HER2 Status 1092 non-null object
16 Tumor Other Histologic Subtype 1092 non-null object
17 Hormone Therapy 1092 non-null object
18 Inferred Menopausal State 1092 non-null object
19 Integrative Cluster 1092 non-null object
20 Primary Tumor Laterality 1092 non-null object
21 Lymph nodes examined positive 1092 non-null float64
22 Mutation Count 1092 non-null float64
23 Nottingham prognostic index 1092 non-null float64
24 Oncotree Code 1092 non-null object
25 Overall Survival (Months) 1092 non-null float64
26 Overall Survival Status 1092 non-null object
27 PR Status 1092 non-null object
28 Radio Therapy 1092 non-null object
29 Relapse Free Status (Months) 1092 non-null float64
30 Relapse Free Status 1092 non-null object
31 Number of Samples Per Patient 1092 non-null int64
32 Sample Type 1092 non-null object
33 Sex 1092 non-null object
34 3-Gene classifier subtype 1092 non-null object
35 Tumor Size 1092 non-null float64
36 Tumor Stage 1092 non-null float64
37 Patient's Vital Status 1092 non-null object
dtypes: float64(10), int64(1), object(27)
memory usage: 332.7+ KB
prepDF = d.select_dtypes(exclude=[object])
objDF = d.select_dtypes(include=[object])
objDF.head()
Output:
Adding Label Encoder to Change String data into integer
nle = preprocessing.LabelEncoder()
for i in objDF.columns:
objDF[i] = nle.fit_transform(objDF[i])
features = pd.concat([objDF, prepDF], axis=1)
freatures
Output:
Check dataset columns
features.columns
Output:
Index(['Study ID', 'Patient ID', 'Sample ID', 'Type of Breast Surgery',
'Cancer Type', 'Cancer Type Detailed', 'Cellularity', 'Chemotherapy',
'Pam50 + Claudin-low subtype', 'ER status measured by IHC', 'ER Status',
'HER2 status measured by SNP6', 'HER2 Status',
'Tumor Other Histologic Subtype', 'Hormone Therapy',
'Inferred Menopausal State', 'Integrative Cluster',
'Primary Tumor Laterality', 'Oncotree Code', 'Overall Survival Status',
'PR Status', 'Radio Therapy', 'Relapse Free Status', 'Sample Type',
'Sex', '3-Gene classifier subtype', 'Patient's Vital Status',
'Age at Diagnosis', 'Cohort', 'Neoplasm Histologic Grade',
'Lymph nodes examined positive', 'Mutation Count',
'Nottingham prognostic index', 'Overall Survival (Months)',
'Relapse Free Status (Months)', 'Number of Samples Per Patient',
'Tumor Size', 'Tumor Stage'],
dtype='object')
EDA to show dataset Column's Relation
for i in prepDF.columns:
plt.hist(prepDF[i])
plt.title(i)
plt.xlabel('samples')
plt.ylabel('frequency')
plt.show()
Output:
...
Building Model
scaler = MinMaxScaler()
features=scaler.fit_transform(features)
#building up the model
deepModel = Sequential()
deepModel.add(Dense(110, activation='relu', input_dim=features.shape[1]))
deepModel.add(Dense(70, activation='relu'))
deepModel.add(Dense(30, activation='relu'))
deepModel.add(Flatten())
deepModel.add(Dense(1, activation='sigmoid'))
deepModel.compile(optimizer='sgd',
loss='mse',
metrics=['accuracy'])
his = deepModel.fit(features,y,validation_split=0.2,epochs=20,batch_size=10)
Output:
Epoch 1/20
88/88 [==============================] - 0s 3ms/step - loss: 0.9908 - accuracy: 0.2257 - val_loss: 0.7325 - val_accuracy: 0.2557
Epoch 2/20
88/88 [==============================] - 0s 1ms/step - loss: 0.7648 - accuracy: 0.2085 - val_loss: 0.6631 - val_accuracy: 0.2557
Epoch 3/20
88/88 [==============================] - 0s 1ms/step - loss: 0.6853 - accuracy: 0.2085 - val_loss: 0.5789 - val_accuracy: 0.2557
Epoch 4/20
88/88 [==============================] - 0s 1ms/step - loss: 0.6054 - accuracy: 0.3517 - val_loss: 0.5048 - val_accuracy: 0.5205
Epoch 5/20
88/88 [==============================] - 0s 1ms/step - loss: 0.5458 - accuracy: 0.5120 - val_loss: 0.4595 - val_accuracy: 0.5708
Epoch 6/20
88/88 [==============================] - 0s 1ms/step - loss: 0.5123 - accuracy: 0.5212 - val_loss: 0.4355 - val_accuracy: 0.5799
Epoch 7/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4950 - accuracy: 0.5235 - val_loss: 0.4243 - val_accuracy: 0.5799
Epoch 8/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4859 - accuracy: 0.5258 - val_loss: 0.4180 - val_accuracy: 0.5845
Epoch 9/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4804 - accuracy: 0.5258 - val_loss: 0.4145 - val_accuracy: 0.5845
Epoch 10/20
88/88 [==============================] - 0s 2ms/step - loss: 0.4773 - accuracy: 0.5281 - val_loss: 0.4106 - val_accuracy: 0.5845
Epoch 11/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4750 - accuracy: 0.5269 - val_loss: 0.4083 - val_accuracy: 0.5845
Epoch 12/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4731 - accuracy: 0.5281 - val_loss: 0.4076 - val_accuracy: 0.5845
Epoch 13/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4717 - accuracy: 0.5304 - val_loss: 0.4053 - val_accuracy: 0.5845
Epoch 14/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4706 - accuracy: 0.5304 - val_loss: 0.4049 - val_accuracy: 0.5845
Epoch 15/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4697 - accuracy: 0.5304 - val_loss: 0.4031 - val_accuracy: 0.5845
Epoch 16/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4689 - accuracy: 0.5326 - val_loss: 0.4023 - val_accuracy: 0.5845
Epoch 17/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4683 - accuracy: 0.5338 - val_loss: 0.4010 - val_accuracy: 0.5890
Epoch 18/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4675 - accuracy: 0.5326 - val_loss: 0.3998 - val_accuracy: 0.5982
Epoch 19/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4671 - accuracy: 0.5349 - val_loss: 0.3992 - val_accuracy: 0.5982
Epoch 20/20
88/88 [==============================] - 0s 1ms/step - loss: 0.4665 - accuracy: 0.5349 - val_loss: 0.3994 - val_accuracy: 0.5890
Check Summary of Deepmodel
deepModel.summary()
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 110) 4290
_________________________________________________________________
dense_1 (Dense) (None, 70) 7770
_________________________________________________________________
dense_2 (Dense) (None, 30) 2130
_________________________________________________________________
flatten (Flatten) (None, 30) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 31
=================================================================
Total params: 14,221
Trainable params: 14,221
Non-trainable params: 0
Accuracy In Visual Form
plt.plot(his.history['accuracy'])
plt.plot(his.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()
Output:
plt.plot(his.history['loss'])
plt.plot(his.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()
Output:
Looking help in machine learning projects, assignments and any coding or if you need help in any support related to machine learning then send your request at realcode4you@gmail.com and get instant help in affordable prices.
We are also offering:
Data Visualization Help in Python, R, MATLAB
Machine Learning Coursework Help
Machine Learning homework Help
Machine Learning Project Help
Machine Learning Coding Help
Machine Learning Assignment Help With R Programming
Machine Learning Assignment Hlep With MATLAB Programming
Model Deployment,
And Other Related Services.
Comments