top of page

Machine Learning Homework Help | An introduction to Machine Learning with Scikit-Learn

realcode4you


The open source Python ecosystem provides a standalone, versatile and powerful scientific working environment, including: NumPy, SciPy, IPython, Matplotlib, Pandas, and many others...


Scikit-Learn builds upon NumPy and SciPy and complements this scientific environment with machine learning algorithms;


By design, Scikit-Learn is non-intrusive, easy to use and easy to combine with other libraries;

Core algorithms are implemented in low-level languages.


Algorithms


Supervised learning:

  • Linear models (Ridge, Lasso, Elastic Net, ...)

  • Support Vector Machines

  • Tree-based methods (Random Forests, Bagging, GBRT, ...)

  • Nearest neighbors

  • Neural networks (basics)

  • Gaussian Processes

  • Feature selection

Unsupervised learning:

  • Clustering (KMeans, Ward, ...)

  • Matrix decomposition (PCA, ICA, ...)

  • Density estimation

  • Outlier detection

Model selection and evaluation:

  • Cross-validation

  • Grid-search

  • Lots of metrics

... and many more! (See our Reference)


Applications

  • Classifying signal from background events;

  • Diagnosing disease from symptoms;

  • Recognising cats in pictures;

  • Identifying body parts with Kinect cameras;


Data

  • Input data = Numpy arrays or Scipy sparse matrices ;

  • Algorithms are expressed using high-level operations defined on matrices or vectors (similar to MATLAB) ;

    • Leverage efficient low-leverage implementations ;

    • Keep code short and readable.


Example:

# Generate data
from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=1000, centers=20, random_state=123)
labels = ["b", "r"]
y = np.take(labels, (y < 10))
print(X) 
print(y[:5])

[[-6.453 -8.764] [ 0.29 0.147] [-5.184 -1.253] ... [-0.231 -1.608] [-0.603 6.873] [ 2.284 4.874]] ['r' 'r' 'b' 'r' 'b']



# X is a 2 dimensional array, with 1000 rows and 2 columns
print(X.shape)
 
# y is a vector of 1000 elements
print(y.shape)

(1000, 2) (1000,)


# Rows and columns can be accessed with lists, slices or masks
print(X[[1, 2, 3]])     # rows 1, 2 and 3
print(X[:5])            # 5 first rows
print(X[500:510, 0])    # values from row 500 to row 510 at column 0
print(X[y == "b"][:5])  # 5 first rows for which y is "b"

[[ 0.29 0.147] [-5.184 -1.253] [-4.714 3.674]] [[-6.453 -8.764] [ 0.29 0.147] [-5.184 -1.253] [-4.714 3.674] [ 4.516 -2.881]] [-4.438 -2.46 4.331 -7.921 1.57 0.565 4.996 4.758 -1.604 1.101] [[-5.184 -1.253] [ 4.516 -2.881] [ 1.708 2.624] [-0.526 8.96 ] [-1.076 9.787]]


# Plot
plt.figure()
for label in labels:
    mask = (y == label)
    plt.scatter(X[mask, 0], X[mask, 1], c=label)
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.show()
















Loading external data

  • Numpy provides some simple tools for loading data from files (CSV, binary, etc);

  • For structured data, Pandas provides more advanced tools (CSV, JSON, Excel, HDF5, SQL, etc);


A simple and unified API

All learning algorithms in scikit-learn share a uniform and limited API consisting of complementary interfaces:

  • an estimator interface for building and fitting models;

  • a predictor interface for making predictions;

  • a transformer interface for converting data.

Goal: enforce a simple and consistent API to make it trivial to swap or plug algorithms.


Estimators


class Estimator(object):
    def fit(self, X, y=None):
        """Fits estimator to data."""
        # set state of ``self``
        return self
# Import the nearest neighbor class
from sklearn.neighbors import KNeighborsClassifier  # Change this to try 
                                                    # something else

# Set hyper-parameters, for controlling algorithm
clf = KNeighborsClassifier(n_neighbors=5)

# Learn a model from training data
clf.fit(X, y)

Output:

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=1, n_neighbors=5, p=2, weights='uniform')


# Estimator state is stored in instance attributes
clf._tree

Output:

<sklearn.neighbors.kd_tree.KDTree at 0x558b1dee6148>



Predictors


# Make predictions  
print(clf.predict(X[:5])) 

Output:

['r' 'r' 'r' 'b' 'b']


# Compute (approximate) class probabilities
print(clf.predict_proba(X[:5]))

[[0. 1. ] [0. 1. ] [0.2 0.8] [0.6 0.4] [0.8 0.2]]



Contact us to get machine Learning project help with an affordable price at realcode4you@gmail.com and get instant help with an our machine learning expert.
15 views0 comments

Comentarios


bottom of page