参考:https://cuijiahua.com/blog/2017/12/ml_13_regtree_1.html 1、ID3算法的弊端 回忆
import numpy as np import pandas as pd import matplotlib.pyplot as plt data = pd.read_csv("./datasets/Social_Network_Ads.csv") X = data.iloc[:, [2, 3]].values y = data.iloc[:, 4].values from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0) from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.fit_transform(X_test) from sklearn.svm import SVC classifier = SVC(kernel = 'linear', random_state = 0) classifier.fit(X_train, y_train) SVC(kernel='linear', random_state=0) y_pred = classifier.
牛顿插值 差商 定义:设 \(f(x)\) 在互异节点\(x_i\)处的函数值为\(f_i, i=0,1,\dots,n\),称\(f[x_i,x_j]=\frac
搜索旋转排序数组 题目: https://leetcode-cn.com/problems/search-in-rotated-sorted-array/ 思路: 明显的二分查找,不过不是有序数组了,而是部分有序,所以需要有判断 代码: class Solution(object): def search(self, nums, target): left, right = 0, len(nums) - 1 while left <= right: mid = left
导入包 import pandas as pd import numpy as np import matplotlib.pyplot as plt 导入数据 data = pd.read_csv("./datasets/studentscores.csv") data.head() Hours Scores 0 2.5 21 1 5.1 47 2 3.2 27 3 8.5 75 4 3.5 30 数据处理 X = data.iloc[:,:1].values Y = data.iloc[:,1].values from sklearn.model_selection import train_test_split X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=1/4,random_state=0) 训练模型 from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor =
import pandas as pd import numpy as np 读入数据 data = pd.read_csv("./datasets/50_Startups.csv") data.head() R&D Spend Administration Marketing Spend State Profit 0 165349.20 136897.80 471784.10 New York 192261.83 1 162597.70 151377.59 443898.53 California 191792.06 2 153441.51 101145.55 407934.54 Florida 191050.39 3 144372.41 118671.85 383199.62 New York 182901.99 4 142107.34 91391.77 366168.42 Florida 166187.94 分开xy X = data.iloc[:,:-1].values Y = data.iloc[:,-1].values 编码 from sklearn.preprocessing import
import pandas as pd import numpy as np 读取文件 df = pd.read_csv( 'https://labfile.oss.aliyuncs.com/courses/1283/telecom_churn.csv') df.head() State Account length Area code International plan Voice mail plan Number vmail messages Total day minutes Total day calls Total day charge Total eve minutes Total eve calls Total eve charge Total night minutes Total night calls Total night charge Total intl minutes Total intl calls Total intl charge Customer service calls Churn
自编码器是一种神经网络模型,可以应用到许多任务中,可以做到降维、生成、特征提取等作用。自编码器大体就分为两个部分,第一个部分为编码器,将原输
学习曲线能判定偏差和方差问题 from sklearn.model_selection import train_test_split,learning_curve import numpy as np from sklearn.svm import SVC from sklearn.datasets import load_digits import matplotlib.pyplot as plt digits = load_digits() X = digits.data Y = digits.target train_sizes,train_loss,test_loss = learning_curve(SVC(gamma=0.001),X,Y,cv=10, scoring='neg_mean_squared_error', train_sizes=[0.1,0.25,0.5,0.75,1]) train_sizes array([ 161, 404, 808, 1212, 1617]) train_loss array([[-0. , -0.09937888, -0.09937888, -0.09937888, -0.09937888, -0.09937888, -0.09937888, -0.09937888, -0.09937888, -0.09937888], [-0. , -0.03960396,