/images/avatar.png

vllbc

搜索旋转排序数组

搜索旋转排序数组

题目:

https://leetcode-cn.com/problems/search-in-rotated-sorted-array/

思路:

明显的二分查找,不过不是有序数组了,而是部分有序,所以需要有判断

代码:

class Solution(object):
    def search(self, nums, target):
        left, right = 0, len(nums) - 1
        while left <= right:
            mid = left + (right - left) // 2
            if nums[mid] == target:
                return mid
            if nums[mid] < nums[right]:#右边为升序
                if nums[mid] < target <= nums[right]:
                    left = mid + 1
                else:
                    right = mid 
            if nums[left] <= nums[mid]:#左边为升序
                if nums[left] <= target < nums[mid]:
                    right = mid 
                else:
                    left = mid + 1
        return -1

简单的线性回归

导入包

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

导入数据

data = pd.read_csv("./datasets/studentscores.csv")
data.head()
Hours Scores
0 2.5 21
1 5.1 47
2 3.2 27
3 8.5 75
4 3.5 30

数据处理

X = data.iloc[:,:1].values
Y = data.iloc[:,1].values
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=1/4,random_state=0)

训练模型

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor = regressor.fit(X_train,Y_train)

预测

Y_pred = regressor.predict(X_test)

画图

plt.scatter(X_train,Y_train,color='red')
plt.plot(X_train,regressor.predict(X_train),color='blue')
plt.scatter(X_test , Y_test, color = 'red')
plt.plot(X_test , regressor.predict(X_test), color ='blue')

复杂的线性回归

import pandas as pd
import numpy as np

读入数据

data = pd.read_csv("./datasets/50_Startups.csv")
data.head()
R&D Spend Administration Marketing Spend State Profit
0 165349.20 136897.80 471784.10 New York 192261.83
1 162597.70 151377.59 443898.53 California 191792.06
2 153441.51 101145.55 407934.54 Florida 191050.39
3 144372.41 118671.85 383199.62 New York 182901.99
4 142107.34 91391.77 366168.42 Florida 166187.94

分开xy

X = data.iloc[:,:-1].values
Y = data.iloc[:,-1].values

编码

from sklearn.preprocessing import LabelEncoder,OneHotEncoder
labelEncoder = LabelEncoder()
X[:,3] = labelEncoder.fit_transform(X[:,3])
onehotencoder = OneHotEncoder()
X = onehotencoder.fit_transform(X).toarray()
X = X[:,1:]
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,Y_train)
LinearRegression()

学习曲线

学习曲线能判定偏差和方差问题

from sklearn.model_selection import train_test_split,learning_curve
import numpy as np
from sklearn.svm import SVC
from sklearn.datasets import load_digits
import matplotlib.pyplot as plt
digits = load_digits()
X = digits.data
Y = digits.target
train_sizes,train_loss,test_loss = learning_curve(SVC(gamma=0.001),X,Y,cv=10,
                                                 scoring='neg_mean_squared_error',
                                                train_sizes=[0.1,0.25,0.5,0.75,1])
train_sizes
array([ 161,  404,  808, 1212, 1617])
train_loss
array([[-0.        , -0.09937888, -0.09937888, -0.09937888, -0.09937888,
        -0.09937888, -0.09937888, -0.09937888, -0.09937888, -0.09937888],
       [-0.        , -0.03960396, -0.03960396, -0.03960396, -0.03960396,
        -0.03960396, -0.03960396, -0.03960396, -0.03960396, -0.03960396],
       [-0.        , -0.01980198, -0.01980198, -0.06435644, -0.01980198,
        -0.01980198, -0.01980198, -0.01980198, -0.01980198, -0.01980198],
       [-0.        , -0.01650165, -0.01320132, -0.01320132, -0.01320132,
        -0.01320132, -0.01320132, -0.01320132, -0.01320132, -0.01320132],
       [-0.02226345, -0.03215832, -0.00989487, -0.03215832, -0.03215832,
        -0.03215832, -0.03215832, -0.03215832, -0.03215832, -0.00989487]])
test_loss
array([[-1.26666667e+00, -1.43333333e+00, -3.96666667e+00,
        -9.73888889e+00, -6.95000000e+00, -5.24444444e+00,
        -3.02777778e+00, -5.25139665e+00, -3.48044693e+00,
        -4.85474860e+00],
       [-1.81111111e+00, -1.13333333e+00, -1.35555556e+00,
        -3.06666667e+00, -2.08333333e+00, -2.85000000e+00,
        -8.38888889e-01, -1.94413408e+00, -5.41899441e-01,
        -1.35195531e+00],
       [-1.71111111e+00, -3.61111111e-01, -5.11111111e-01,
        -9.61111111e-01, -6.16666667e-01, -5.88888889e-01,
        -1.22222222e-01, -9.16201117e-01, -7.76536313e-01,
        -1.14525140e+00],
       [-1.22222222e+00, -3.61111111e-01, -4.44444444e-01,
        -7.00000000e-01, -5.55555556e-01, -2.66666667e-01,
        -8.88888889e-02, -1.11731844e-02, -9.21787709e-01,
        -8.43575419e-01],
       [-9.33333333e-01, -0.00000000e+00, -2.66666667e-01,
        -2.83333333e-01, -2.77777778e-01, -3.61111111e-01,
        -8.88888889e-02, -5.58659218e-03, -9.21787709e-01,
        -4.18994413e-01]])
train_mean = -np.mean(train_loss,axis=1)
test_mean = -np.mean(test_loss,axis=1)
train_mean
array([0.08944099, 0.03564356, 0.02227723, 0.01221122, 0.02671614])
plt.plot(train_sizes,train_mean,label="Training")
plt.plot(train_sizes,test_mean,label="Cross-validation")
plt.legend()
plt.show()


learn_four

pandas补充学习

推荐网站:http://joyfulpandas.datawhale.club/Content/Preface.html

pandas核心操作手册:https://mp.weixin.qq.com/s/l1V5e726XixI0W3EDHx0Nw

pd.join和pd.merge

可以说merge包含了join操作,merge支持两个df间行方向或列方向的拼接操作,默认列拼接,取交集,而join只是简化了merge的行拼接的操作 pandas的merge方法提供了一种类似于SQL的内存链接操作,官网文档提到它的性能会比其他开源语言的数据操作(例如R)要高效。 如果对于sql比较熟悉的话,merge也比较好理解。 merge的参数

线性判别分析

线性判别分析(LDA)

线性判别分析,也就是LDA(与主题模型中的LDA区分开),现在常常用于数据的降维中,但从它的名字中可以看出来它也是一个分类的算法,而且属于硬分类,也就是结果不是概率,是具体的类别 ## 主要思想 1. 类内方差小 2. 类间方差大 ## 推导 这里以二类为例,即只有两个类别。

bayes

条件概率

\(P(B|A) = \frac{P(AB)}{P(A)}\)

乘法法则

如果P(A) > 0 \(P(AB) = P(A)P(B|A)\) 如果\(P(A_1 \dots A_{n-1})\) > 0 则

\[ \begin{aligned} P(A_1A_2\dots A_n) = P(A_1A_2\dots A_{n-1})P(A_n | A_1A_2\dots A_{n-1}) \\\\ = P(A_1)P(A_2|A_1)P(A_3|A_1A_2)\dots P(A_n|A_1A_2\dots A_{n-1}) \end{aligned} \]

其中第一步使用了乘法公式,然后再对前者继续使用乘法公式,以此类推,就可以得到最后的结果。

rot90

正为逆时针转,负为顺时针转。

import numpy as np
mat = np.array([[1,3,5],
                [2,4,6],
                [7,8,9]
                ])
print mat, "# orignal"
mat90 = np.rot90(mat, 1)
print mat90, "# rorate 90 <left> anti-clockwise"
mat90 = np.rot90(mat, -1)
print mat90, "# rorate 90 <right> clockwise"
mat180 = np.rot90(mat, 2)
print mat180, "# rorate 180 <left> anti-clockwise"
mat270 = np.rot90(mat, 3)
print mat270, "# rorate 270 <left> anti-clockwise"

直接复制的代码,python2,能看懂就行。