袋熊的树洞

日拱一卒,功不唐捐

0%

问题描述

Write a program to find the node at which the intersection of two singly linked lists begins.

For example, the following two linked lists:

begin to intersect at node c1.

Example 1:

1
2
3
Input: intersectVal = 8, listA = [4,1,8,4,5], listB = [5,0,1,8,4,5], skipA = 2, skipB = 3
Output: Reference of the node with value = 8
Input Explanation: The intersected node's value is 8 (note that this must not be 0 if the two lists intersect). From the head of A, it reads as [4,1,8,4,5]. From the head of B, it reads as [5,0,1,8,4,5]. There are 2 nodes before the intersected node in A; There are 3 nodes before the intersected node in B.

Example 2:

1
2
3
Input: intersectVal = 2, listA = [0,9,1,2,4], listB = [3,2,4], skipA = 3, skipB = 1
Output: Reference of the node with value = 2
Input Explanation: The intersected node's value is 2 (note that this must not be 0 if the two lists intersect). From the head of A, it reads as [0,9,1,2,4]. From the head of B, it reads as [3,2,4]. There are 3 nodes before the intersected node in A; There are 1 node before the intersected node in B.

Example 3:

1
2
3
4
Input: intersectVal = 0, listA = [2,6,4], listB = [1,5], skipA = 3, skipB = 2
Output: null
Input Explanation: From the head of A, it reads as [2,6,4]. From the head of B, it reads as [1,5]. Since the two lists do not intersect, intersectVal must be 0, while skipA and skipB can be arbitrary values.
Explanation: The two lists do not intersect, so return null.

Notes:

  • If the two linked lists have no intersection at all, return null.
  • The linked lists must retain their original structure after the function returns.
  • You may assume there are no cycles anywhere in the entire linked structure.
  • Your code should preferably run in O(n) time and use only O(1) memory.

Related Topics: Linked List

原问题: 160. Intersection of Two Linked Lists

中文翻译版: 160. 相交链表

解决方案

这里假设两条链表有相交节点,如下图所示:

图中 AD 线段代表链表1,线段 CB 加 线段 BD 代表链表2,链表1要长于链表2,两条链表相交于节点 B,链表1长度为 |AD| = |AB| + |BD| = p + n,链表2长度为 |CB| + |BD| = m + n(__注明__:这里长度定义为从线段开始节点遍历到结束节点所要移动的节点数)

现在开始同时遍历链表1和链表2,由于链表2比链表1要短,所以链表2最先遍历完,此时链表1遍历到节点 E,因此 |AE| = m + n,继续遍历链表2直到遍历结束,从节点 E 到节点 D 的长度为 ED = q

根据图中表示,我们可以得到一个等式,那就是

1
2
3
    |AB| + |BD| = |AE| + |ED|
==> p + n = m + n + q
==> p - m = q

从上面等式可以得到 |AB| - |CB| = q,等式说明了链表1头节点 A 到相交节点 B 的长度比链表2头节点 C 到相交节点 B 长度要长 q,这个 q 是已知量,说明链表1第 q 个节点到节点 B 的距离要等于链表2节点 C 到节点 B 的距离。这里就可以得出该题的一个解题思路:

1
2
设定两个指针p1和p2,分别用于遍历链表1和链表2,指针p1先移动到链表1的第q个节点,
然后指针p2开始遍历链表2,直到 p1 == p2,此时 p1 为两个链表相交节点
参考解题代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class Solution {
public:
ListNode *getIntersectionNode(ListNode *headA, ListNode *headB) {
if (NULL == headA || NULL == headB)
return NULL;
if (headA == headB)
return headA;

ListNode *prevA, *prevB;
ListNode *currA, *currB;
ListNode *posA, *posB;

prevA = prevB = NULL;
posA = currA = headA;
posB = currB = headB;
while ((currA != NULL) || (currB != NULL)) {
if (currA != NULL) {
prevA = currA;
currA = currA->next;
} else {
posB = posB->next;
}
if (currB != NULL) {
prevB = currB;
currB = currB->next;
} else {
posA = posA->next;
}
}
// have intersection
if (prevA == prevB) {
while (posA != posB) {
posA = posA->next;
posB = posB->next;
}
return posA;
}

return NULL;
}
};

问题描述

Given a linked list, determine if it has a cycle in it.

To represent a cycle in the given linked list, we use an integer pos which represents the position (0-indexed) in the linked list where tail connects to. If pos is -1, then there is no cycle in the linked list.

Example 1:

1
2
3
Input: head = [3,2,0,-4], pos = 1
Output: true
Explanation: There is a cycle in the linked list, where tail connects to the second node.

Example 2:

1
2
3
Input: head = [1,2], pos = 0
Output: true
Explanation: There is a cycle in the linked list, where tail connects to the first node.

Example 3:

1
2
3
Input: head = [1], pos = -1
Output: false
Explanation: There is no cycle in the linked list.

Follow up:

Can you solve it using O(1) (i.e. constant) memory?

Related Topics: Linked List, Two Pointers

原问题: 141. Linked List Cycle

中文翻译版: 141. 环形链表

解决方案

该题可以使用双指针方法进行解决,设定快指针 fast 和慢指针 slow,两指针同时从头节点 head 出发,慢指针每前进一个节点,快指针就前进两个节点,如果链表有环,由于两指针前进速度不同,最终两指针会汇聚在同一个节点,即 fast == slow,否则,快指针会最先到达链表节点,两指针不会汇聚在一起。

参考解题代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <iostream>
#include "List.h"
using namespace std;

/**
* Definition for singly-linked list.
* struct ListNode {
* int val;
* ListNode *next;
* ListNode(int x) : val(x), next(NULL) {}
* };
*/
class Solution {
public:
bool hasCycle(ListNode *head) {
if (head == NULL)
return false;

ListNode *fast, *slow;
fast = slow = head;
while (fast != NULL && fast->next != NULL) {
slow = slow->next;
fast = fast->next->next;
if (slow == fast)
return true;
}

return false;
}
};

int main()
{
ListNode *node1 = create_list_node(1);
ListNode *node2 = create_list_node(2);
ListNode *node3 = create_list_node(3);
ListNode *node4 = create_list_node(4);
connect_list_nodes(node1, node2);
connect_list_nodes(node2, node3);
connect_list_nodes(node3, node4);
connect_list_nodes(node4, node2);

Solution solu;
cout << solu.hasCycle(node1) << endl;

return 0;
}

问题描述

Determine if a 9x9 Sudoku board is valid. Only the filled cells need to be validated according to the following rules:

  1. Each row must contain the digits 1-9 without repetition.
  2. Each column must contain the digits 1-9 without repetition.
  3. Each of the 9 3x3 sub-boxes of the grid must contain the digits 1-9 without repetition.

A partially filled sudoku which is valid.

The Sudoku board could be partially filled, where empty cells are filled with the character '.'.

Example 1:

1
2
3
4
5
6
7
8
9
10
11
12
13
Input:
[
["5","3",".",".","7",".",".",".","."],
["6",".",".","1","9","5",".",".","."],
[".","9","8",".",".",".",".","6","."],
["8",".",".",".","6",".",".",".","3"],
["4",".",".","8",".","3",".",".","1"],
["7",".",".",".","2",".",".",".","6"],
[".","6",".",".",".",".","2","8","."],
[".",".",".","4","1","9",".",".","5"],
[".",".",".",".","8",".",".","7","9"]
]
Output: true

Example 2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Input:
[
["8","3",".",".","7",".",".",".","."],
["6",".",".","1","9","5",".",".","."],
[".","9","8",".",".",".",".","6","."],
["8",".",".",".","6",".",".",".","3"],
["4",".",".","8",".","3",".",".","1"],
["7",".",".",".","2",".",".",".","6"],
[".","6",".",".",".",".","2","8","."],
[".",".",".","4","1","9",".",".","5"],
[".",".",".",".","8",".",".","7","9"]
]
Output: false
Explanation: Same as Example 1, except with the 5 in the top left corner being
modified to 8. Since there are two 8's in the top left 3x3 sub-box, it is invalid.

Note:

  • A Sudoku board (partially filled) could be valid but is not necessarily solvable.
  • Only the filled cells need to be validated according to the mentioned rules.
  • The given board contain only digits 1-9 and the character '.'.
  • The given board size is always 9x9.

Related Topics: Hash Table

原问题: 36. Valid Sudoku

中文翻译版: 36. 有效的数独

解决方案

方案1

根据题目说明,一个有效的数独,满足三个条件:

  1. 每行数字有重复数字
  2. 每列不能有重复数字
  3. 每个 3x3 块中不能有重复数字

怎么判断一行、一列或者一个小块中是否有重复数字,此时我们可以给用哈希表进行快速查找判断。首先我们分别给每一行、每一列以及每一小块建立一个哈希表,然后我们遍历所有数字,当遍历到某个数字时,我们根据该数字所处的行、列以及小块找到对应的哈希表,查找该数字是否在哈希表中出现,如果出现,说明该数独是无效的,否则我们将该数字存入哈希表,继续遍历。

参考解题代码1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#include <vector>
#include <unordered_set>
#include <iostream>
using namespace std;


class Solution {
public:
bool isValidSudoku(vector<vector<char>>& board) {
vector<unordered_set<char>> row_sets(9);
vector<unordered_set<char>> column_sets(9);
vector<unordered_set<char>> block_sets(9);

char ch;
int block_id;
for (auto i=0; i<9; i++) {
for (auto j=0; j<9; j++) {
ch = board[i][j];

if (ch == '.')
continue;

if (row_sets[i].find(ch) == row_sets[i].end())
row_sets[i].insert(ch);
else
return false;

if (column_sets[j].find(ch) == column_sets[j].end())
column_sets[j].insert(ch);
else
return false;

block_id = int(i / 3.0) * 3 + int(j / 3.0);
if (block_sets[block_id].find(ch) == block_sets[block_id].end())
block_sets[block_id].insert(ch);
else
return false;
}
}

return true;
}
};

int main()
{
vector<vector<char>> board = {
{'5', '3', '.', '.', '7', '.', '.', '.', '.'},
{'6', '.', '.', '1', '9', '5', '.', '.', '.'},
{'.', '9', '8', '.', '.', '.', '.', '6', '.'},
{'8', '.', '.', '.', '6', '.', '.', '.', '3'},
{'4', '.', '.', '8', '.', '3', '.', '.', '1'},
{'7', '.', '.', '.', '2', '.', '.', '.', '6'},
{'.', '6', '.', '.', '.', '.', '2', '8', '.'},
{'.', '.', '.', '4', '1', '9', '.', '.', '5'},
{'.', '.', '.', '.', '8', '.', '.', '7', '9'}
};

for (auto i=0; i<board.size(); i++) {
for (auto j=0; j<board[i].size(); j++) {
cout << board[i][j] << " ";
}
cout << endl;
}

Solution solu;
cout << "Is valid: " << solu.isValidSudoku(board) << endl;

return 0;
}

方案2

同方案1的思想,只不过此时一行、一列以及一小块对应的哈希表分别用一个整数进行替代,通过该整数的某一位是否为1来进行重复数字判断,主要使用的是位与运算 & 和位或运算 |。当遍历到某个数字 x 时,该数字所在行对应的整数为 y,此时判断该数字是否重复可以进行如下操作:

1
y & (1 << x)

如果该表达式值非0,说明 y 的第 x 位是1,这说明该数字之前出现过,否则该表达式值为0。如果该数字未重复出现,则 y 设为:

1
y = y | (1 << x) 
参考解题代码2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#include <iostream>
#include <cmath>
#include <vector>
using namespace std;


class Solution {
public:
bool isValidSudoku(vector<vector<char>>& board) {
int row_status[board.size()];
int col_status[board.size()];
int cell_status[board.size()];
int digit, cell, block_size, num_blocks;

for (int i=0; i<board.size(); i++) {
row_status[i] = 0;
col_status[i] = 0;
cell_status[i] = 0;
}

block_size = int(sqrt(board.size()));
num_blocks = board.size() / block_size;
for (int i=0; i<board.size(); i++) {
for (int j=0; j<board[i].size(); j++) {
if (board[i][j] == '.')
continue;

digit = 1 << (board[i][j] - '0');
cell = (i / block_size) * num_blocks + (j / block_size);
if ((row_status[i] & digit) != 0)
return false;
if ((col_status[j] & digit) != 0)
return false;
if ((cell_status[cell] & digit) != 0)
return false;
row_status[i] |= digit;
col_status[j] |= digit;
cell_status[cell] |= digit;
}
}

return true;
}
};

int main()
{
vector<vector<char>> board = {
{'5', '3', '.', '.', '7', '.', '.', '.', '.'},
{'6', '.', '.', '1', '9', '5', '.', '.', '.'},
{'.', '9', '8', '.', '.', '.', '.', '6', '.'},
{'8', '.', '.', '.', '6', '.', '.', '.', '3'},
{'4', '.', '.', '8', '.', '3', '.', '.', '1'},
{'7', '.', '.', '.', '2', '.', '.', '.', '6'},
{'.', '6', '.', '.', '.', '.', '2', '8', '.'},
{'.', '.', '.', '4', '1', '9', '.', '.', '5'},
{'.', '.', '.', '.', '8', '.', '.', '7', '9'}
};

for (auto i=0; i<board.size(); i++) {
for (auto j=0; j<board[i].size(); j++) {
cout << board[i][j] << " ";
}
cout << endl;
}

Solution solu;
cout << "Is valid: " << solu.isValidSudoku(board) << endl;

return 0;
}

问题描述

Implement int sqrt(int x).

Compute and return the square root of $x$, where $x$ is guaranteed to be a non-negative integer.

Since the return type is an integer, the decimal digits are truncated and only the integer part of the result is returned.

Example 1:

1
2
Input: 4
Output: 2

Example 2:

1
2
3
4
Input: 8
Output: 2
Explanation: The square root of 8 is 2.82842..., and since
the decimal part is truncated, 2 is returned.

Related Topics: Math, Binary Search

原问题: 69. Sqrt(x)

中文翻译版: 69. x 的平方根

解决方案

首先 $x$ 的平方根取值范围为 $[0, x]$,该区间中可能是 $x$ 的平方根的数要求满足其平方小于等于 $x$。题目中要求输出的是整数,此时问题变为在区间 $[0, x]$ 中寻找满足 $s^2 <= x$ 条件的最大的整数 $s$。

很自然地,我们可以从 $0$ 开始遍历,直到某个值的平方大于 $x$,此时前一个值就是所求的平方根,但是这种解法一般会超时。既然是在一个离散的区间内进行查找,并且区间的元素是有序的,此时我们可以用二分查找 (Binary Search) 快速找到我们想要的值。

参考解题代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/*
* Use binary search to find the square root of x
*/

#include <iostream>
using namespace std;


class Solution {
public:
int mySqrt(int x) {
long long low, high, mid, square;

// search in [low, high), so high = x + 1
// use casting to avoid numerical overflow
low = 0, high = (long long)x + 1;
while (low < high) {
mid = low + (high - low) / 2;
square = mid * mid;
if (square <= x)
low = mid + 1;
else
high = mid;
}
return low - 1;
}
};

int main()
{
int x = 2147483647;
Solution solu;

cout << "Sqrt(" << x << ") = " << solu.mySqrt(x) << endl;
return 0;
}

这里参考了知乎一个关于二分查找问题的回答 二分查找有几种写法?它们的区别是什么?,因为我们要找的是 $s^2 <= x$ (等价于 $s <= \sqrt{x}$) 的上界,所以参考了 upper_bound(value) - 1 的写法。

问题描述

Implement pow(x, n), which calculates $x$ raised to the power $n$ ($x^n$).

Example 1:

1
2
Input: 2.00000, 10
Output: 1024.00000

Example 2:

1
2
Input: 2.10000, 3
Output: 9.26100

Example 3:

1
2
3
Input: 2.00000, -2
Output: 0.25000
Explanation: 2^(-2) = (1/2)^2 = 1/4 = 0.25

Note:

  • $-100.0 \lt x \lt 100.0$
  • $n$ is a 32-bit signed integer, within the range $[-2^{31}, 2^{31}-1]$

Related Topics: Math, Binary Search

原问题: 50. Pow(x, n)

中文翻译版: 50. Pow(x, n)

解决方案

方案1

题目中 $n$ 是整数,此时求 $x$ 的 $n$ 次方可以分解为两个 $x$ 的 $n/2$ 次方相乘,即:

$$
x^n = \begin{cases}
x^{n/2} \cdot x^{n/2} & n \text{ is even} \
x^{n/2} \cdot x^{n/2} \cdot x & n \text{ is odd}
\end{cases}
$$

则此题可以用递归进行求解,需要注意的是如果 $n$ 是负数,不能在代码里将 $n$ 转为正数,$x$ 转为 $1/x$,因为该题的测试用例中会有 $n = -2^{31}$ 这种取值,如果取正会导致数值溢出,解决办法是当 $n$ 为奇数时,此时

$$
x^{n} = x^{n/2} \cdot x^{n/2} \cdot \frac{1}{x}
$$

参考解题代码1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <iostream>
using namespace std;

class Solution {
public:
double myPow(double x, int n) {
// do not transfer n to -n if n < 0
// because of numerical overflow (n = -2^31)
if (n == 0)
return 1.0;
double half = myPow(x, n/2);
if (n % 2 == 0) {
return half * half;
} else {
if (n < 0)
x = 1 / x;
return half * half * x;
}
}
};

int main()
{
double x;
int n;
Solution solu;

x = 1.00000;
n = -2147483648; // n = -2^31
cout << "Pow(" << x << ", " << n << ") = "
<< solu.myPow(x, n) << endl;
return 0;
}

方案2

方案1是递归解法,这里介绍非递归解法。从二进制角度看整数 $n$,如果第 $k$ 位为1,说明 $x^n$ 可以表示为 $x^n = x^{2^k} \cdot x^{n-2^k}$,以此类推,将余下的 $x^{n-2^k}$ 根据非零位进行分解。例如 $n=5$,其二进制表示为 101,则根据非零位,我们可以得到以下分解:

$$
x^5 = x^{2^2} \cdot x^{2^0}
$$

根据分解可以得到一个迭代解法,就是计算结果初始值为 ans = 1,从第0位依次往高位对 $n$ 的二进制进行非零判断,如果 $n$ 的二进制第 $k$ 位非0,则 ans 乘上 $x^{2^k}$,即

$$
\text{ans} = \text{ans} \cdot x^{2^k}
$$

遍历完 $n$ 的所有二进制位,ans 就是我们求得的计算结果。

那么我们怎么快速得到 $x^{2^k}$ 呢?我们可以将 $n$ 不断进行右移操作,每移动1位,对 $x$ 就进行以下计算

1
x *= x

这样当我们右移 $k$ 次时,此时 $x$ 已经是原始 $x$ 的 $2^k$ 次方,此时我们用位与运算判断第1位是否非0,如果非0,按照前面迭代过程可得,最终计算结果 ans 需要乘上 $x$。

参考解题代码2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
using namespace std;


class Solution {
public:
double myPow(double x, int n) {
if (n == 0)
return 1.0;

long long num = n;
double ans;

if (n < 0) {
// use long long type to avoid numerical overflow
num = -(long long)n;
x = 1 / x;
}

ans = 1.0;
while (num > 0) {
if ((num & 1) != 0)
ans *= x;
x *= x;
num >>= 1;
}

return ans;
}
};

int main()
{
double x;
int n;
Solution solu;

x = 1.00000;
n = -2147483648; // n = -2^31
cout << "Pow(" << x << ", " << n << ") = "
<< solu.myPow(x, n) << endl;
return 0;
}

1. 安装环境

OS Version: OS X 10.11.6

2. 获取安装脚本

这里没有选择 MacTex 安装方式,而是采用 Unix Install Script 进行在线安装。之所以不用 MaxTex 是由于电脑系统版本有点老,MacTeX-2019 需要 Mac OS 10.12 以上版本的系统。

下载 install-tl-unx.tar.gz 安装脚本,解压后,得到以下文件。

安装脚本为 install-tl

3. 开始安装

启动安装只需要执行 install-tl 即可。这里直接在控制台执行安装脚本。

1
./install-tl-20200127/install-tl

4. 安装配置

4.1. 配置镜像地址

安装第一步是配置镜像地址,具体镜像的选择根据所处网络环境进行决定,这里选择了清华大学的镜像地址。

选择完镜像后,安装界面会显示切换后的镜像地址。

4.2. 详细配置

接下来是安装配置,简易设置仅仅设置安装路径以及默认纸张。如果要非root安装Tex Live,安装路径设为用户目录即可,这里设置安装路径为:

1
/Users/luowanqian/local/texlive/2019

如果要更加详细的配置,点 Advanced 按钮即可。

配置完后点 安装 启动安装。

4.3. 等待安装完毕

启动安装后,安装程序会自动下载包进行安装。这部分时间消耗较长,耐心等待即可。

安装完毕会显示如下图。

__注意__:安装完毕后,界面(见上图)会提示相关环境变量的设置,内容大概如下。

1
2
3
4
5
6
7
8
9
欢迎进入 TeX Live 的世界!

See /Users/luowanqian/local/texlive/2019/index.html for links to documentation.
The TeX Live web site (https://tug.org/texlive/) contains any updates and corrections. TeX Live is a joint project of the TeX user groups around the world; please consider supporting it by joining the group best for you. The list of groups is available on the web at https://tug.org/usergroups.html.

Add /Users/luowanqian/local/texlive/2019/texmf-dist/doc/man to MANPATH.
Add /Users/luowanqian/local/texlive/2019/texmf-dist/doc/info to INFOPATH.
Most importantly, add /Users/luowanqian/local/texlive/2019/bin/x86_64-darwinlegacy
to your PATH for current and future sessions.

5. 环境变量设置

安装最后一步是设置环境变量,由前面提示可知,要设置的环境变量有三个:MANPATHINFOPATH 以及 PATH

根据提示在 .bashrc 文件中设置环境变量即可完成安装。

1
2
3
export MANPATH="$MANPATH:/Users/luowanqian/local/texlive/2019/texmf-dist/doc/man"
export INFOPATH="$INFOPATH:/Users/luowanqian/local/texlive/2019/texmf-dist/doc/info"
export PATH="$PATH:/Users/luowanqian/local/texlive/2019/bin/x86_64-darwinlegacy"

Reference

1: TeX Live - Quick install

2: Unix Install of TeXLive 2019

1. 将 NumPy 导入为 np,并查看版本

English Version

Title: Import numpy as np and see the version

Difficulty Level: L1

Question: Import numpy as np and print the version number.


难度:L1

问题:将 NumPy 导入为 np,并输出版本号。

Solution
1
2
3
>>> import numpy as np
>>> print(np.__version__)
1.15.4

2. 如何创建 1 维数组?

English Version

Title: How to create a 1D array?

Difficulty Level: L1

Question: Create a 1D array of numbers from 0 to 9.


难度:L1

问题:创建数字从 0 到 9 的 1 维数组。

期望输出:

1
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Solution
1
2
3
>>> arr = np.arange(10)
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

3. 如何创建 boolean 数组?

English Version

Title: How to create a boolean array?

Difficulty Level: L1

Question: Create a 3×3 numpy array of all True’s.


难度:L1

问题:创建所有值为 True 的 3×3 NumPy 数组。

Solution 1
1
2
3
4
>>> np.full((3, 3), True)
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])
Solution 2
1
2
3
4
>>> np.ones((3, 3), dtype=bool)
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])

4. 如何从 1 维数组中提取满足给定条件的项?

English Version

Title: How to extract items that satisfy a given condition from 1D array?

Difficulty Level: L1

Question: Extract all odd numbers from arr.


难度:L1

问题:从 arr 中提取所有奇数。

输入:

1
>>> arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出:

1
array([1, 3, 5, 7, 9])
Solution
1
2
>>> arr[arr % 2 == 1]
array([1, 3, 5, 7, 9])

5. 如何将 NumPy 数组中满足给定条件的项替换成另一个数值?

English Version

Title: How to replace items that satisfy a condition with another value in numpy array?

Difficulty Level: L1

Question: Replace all odd numbers in arr with -1.


难度:L1

问题:将 arr 中的所有奇数替换成 -1。

输入:

1
>>> arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出:

1
array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1])
Solution
1
2
3
>>> arr[arr % 2 == 1] = -1
>>> arr
array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1])

6. 如何在不影响原始数组的前提下替换满足给定条件的项?

English Version

Title: How to replace items that satisfy a condition without affecting the original array?

Difficulty Level: L2

Question: Replace all odd numbers in arr with -1 without changing arr.


难度:L2

问题:将 arr 中所有奇数替换成 -1,且不改变 arr

输入:

1
>>> arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出:

1
2
3
4
>>> out
array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1])
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Solution 1
1
2
3
4
5
6
7
>>> out = np.copy(arr)
>>> out[out % 2 == 1] = -1
>>> out
array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1])
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
out
Solution 2
1
2
3
4
5
>>> out = np.where(arr % 2 == 1, -1, arr)
>>> out
array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1])
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

7. 如何重塑(reshape)数组?

English Version

Title: How to reshape an array?

Difficulty Level: L1

Question: Convert a 1D array to a 2D array with 2 rows.


难度:L1

问题:将 1 维数组转换成 2 维数组(两行)。

输入:

1
2
3
>>> arr = np.arange(10)
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出:

1
2
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Solution
1
2
3
>>> arr.reshape((2, -1))
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])

8. 如何垂直堆叠两个数组?

English Version

Title: How to stack two arrays vertically?

Difficulty Level: L2

Question: Stack arrays a and b vertically.


难度:L2

问题:垂直堆叠数组 ab

输入:

1
2
3
4
5
6
7
8
>>> a = np.arange(10).reshape(2, -1)
>>> b = np.repeat(1, 10).reshape(2, -1)
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> b
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])

期望输出:

1
2
3
4
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
Solution 1
1
2
3
4
5
>>> np.concatenate((a, b), axis=0)
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
Solution 2
1
2
3
4
5
>>> np.vstack((a, b))
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
Solution 3
1
2
3
4
5
>>> np.r_[a, b]
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])

9. 如何水平堆叠两个数组?

English Version

Title: How to stack two arrays horizontally?

Difficulty Level: L2

Question: Stack the arrays a and b horizontally.


难度:L2

问题:水平堆叠数组 ab

输入:

1
2
3
4
5
6
7
8
>>> a = np.arange(10).reshape(2, -1)
>>> b = np.repeat(1, 10).reshape(2, -1)
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> b
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])

期望输出:

1
2
array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
[5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])
Solution 1
1
2
3
>>> np.concatenate((a, b), axis=1)
array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
[5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])
Solution 2
1
2
3
>>> np.hstack((a, b))
array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
[5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])
Solution 3
1
2
3
>>> np.c_[a, b]
array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
[5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])

10. 在不使用硬编码的前提下,如何在 NumPy 中生成自定义序列?

English Version

Title: How to generate custom sequences in numpy without hardcoding?

Difficulty Level: L2

Question: Create the following pattern without hardcoding. Use only numpy functions and the below input array a.


难度:L2

问题:在不使用硬编码的前提下创建以下模式。仅使用 NumPy 函数和以下输入数组 a

输入

1
>>> a = np.array([1, 2, 3])

期望输出:

1
array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])
Solution 1
1
2
>>> np.concatenate((np.repeat(a, 3), np.tile(a, 3)))
array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])
Solution 2
1
2
>>> np.r_[np.repeat(a, 3), np.tile(a, 3)]
array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

11. 如何获得两个 Python NumPy 数组中共同的项?

English Version

Title: How to get the common items between two python numpy arrays?

Difficulty Level: L2

Question: Get the common items between a and b.


难度:L2

问题:获取数组 ab 中的共同项。

输入:

1
2
>>> a = np.array([1, 2, 3, 2, 3, 4, 3, 4, 5, 6])
>>> b = np.array([7, 2, 10, 2, 7, 4, 9, 4, 9, 8])

期望输出:

1
array([2, 4])
Solution
1
2
>>> np.intersect1d(a, b)
array([2, 4])

12. 如何从一个数组中移除与另一个数组重复的项?

English Version

Title: How to remove from one array those items that exist in another?

Difficulty Level: L2

Question: From array a remove all items present in array b.


难度:L2

问题:从数组 a 中移除出现在数组 b 中的所有项。

输入:

1
2
>>> a = np.array([1, 2, 3, 4, 5])
>>> b = np.array([5, 6, 7, 8, 9])

期望输出:

1
array([1, 2, 3, 4])
Solution
1
2
>>> np.setdiff1d(a, b)
array([1, 2, 3, 4])

13. 如何获取两个数组匹配元素的位置?

English Version

Title: How to get the positions where elements of two arrays match?

Difficulty Level: L2

Question: Get the positions where elements of a and b match.


难度:L2

问题:获取数组 a 和 b 中匹配元素的位置。

输入:

1
2
>>> a = np.array([1, 2, 3, 2, 3, 4, 3, 4, 5, 6])
>>> b = np.array([7, 2, 10, 2, 7, 4, 9, 4, 9, 8])

期望输出:

1
(array([1, 3, 5, 7]), )
Solution
1
2
>>> np.where(a == b)
(array([1, 3, 5, 7]),)

14. 如何从 NumPy 数组中提取给定范围内的所有数字?

English Version

Title: How to extract all numbers between a given range from a numpy array?

Difficulty Level: L2

Question: Get all items between 5 and 10 from a.


难度:L2

问题:从数组 a 中提取 5 和 10 之间的所有项。

输入:

1
>>> a = np.array([2, 6, 1, 9, 10, 3, 27])

期望输出:

1
array([6, 9, 10])
Solution 1
1
2
>>> a[(a >= 5) & (a <= 10)]
array([ 6, 9, 10])
Solution 2
1
2
3
>>> index = np.where((a >= 5) & (a <= 10))
>>> a[index]
array([ 6, 9, 10])
Solution 3
1
2
3
>>> index = np.where(np.logical_and(a>=5, a<=10))
>>> a[index]
array([ 6, 9, 10])

15. 如何创建一个 Python 函数以对 NumPy 数组执行元素级的操作?

English Version

Title: How to make a python function that handles scalars to work on numpy arrays?

Difficulty Level: L2

Question: Convert the function maxx that works on two scalars, to work on two arrays.


难度:L2

问题:转换函数 maxx,使其从只能对比标量而变为对比两个数组。

输入:

1
2
3
4
5
6
7
8
9
>>> def maxx(x, y):
... """Get the maximum of two items"""
... if x >= y:
... return x
... else:
... return y
...
>>> maxx(1, 5)
5

期望输出:

1
2
3
4
>>> a = np.array([5, 7, 9, 8, 6, 4, 5])
>>> b = np.array([6, 3, 4, 8, 9, 7, 1])
>>> pair_max(a, b)
array([6., 7., 9., 8., 9., 7., 5.])
Solution
1
2
3
4
5
>>> pair_max = np.vectorize(maxx, otypes=[float])
>>> a = np.array([5, 7, 9, 8, 6, 4, 5])
>>> b = np.array([6, 3, 4, 8, 9, 7, 1])
>>> pair_max(a, b)
array([6., 7., 9., 8., 9., 7., 5.])

16. 如何在 2d NumPy 数组中交换两个列?

English Version

Title: How to swap two columns in a 2d numpy array?

Difficulty Level: L2

Question: Swap columns 1 and 2 in the array arr.


难度:L2

问题:在数组 arr 中交换列 1 和列 2。

1
2
3
4
5
>>> arr = np.arange(9).reshape(3, 3)
>>> arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Solution 1
1
2
3
4
>>> arr[:, [1, 0, 2]]
array([[1, 0, 2],
[4, 3, 5],
[7, 6, 8]])
Solution 2
1
2
3
4
5
6
7
8
# Swap in-place
>>> tmp = arr[:, 0].copy()
>>> arr[:, 0] = arr[:, 1]
>>> arr[:, 1] = tmp
>>> arr
array([[1, 0, 2],
[4, 3, 5],
[7, 6, 8]])

17. 如何在 2d NumPy 数组中交换两个行?

English Version

Title: How to swap two rows in a 2d numpy array?

Difficulty Level: L2

Question: Swap rows 1 and 2 in the array arr.


难度:L2

问题:在数组 arr 中交换行 1 和行 2。

1
2
3
4
5
>>> arr = np.arange(9).reshape(3, 3)
>>> arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Solution 1
1
2
3
4
>>> arr[[1, 0, 2], :]
array([[3, 4, 5],
[0, 1, 2],
[6, 7, 8]])
Solution 2
1
2
3
4
5
6
7
8
# Swap in-place
>>> tmp = arr[0, :].copy()
>>> arr[0, :] = arr[1, :]
>>> arr[1, :] = tmp
>>> arr
array([[3, 4, 5],
[0, 1, 2],
[6, 7, 8]])

18. 如何反转 2D 数组的所有行?

English Version

Title: How to reverse the rows of a 2D array?

Difficulty Level: L2

Question: Reverse the rows of a 2D array arr.


难度:L2

问题:反转 2D 数组 arr 中的所有行。

1
2
3
4
5
>>> arr = np.arange(9).reshape(3, 3)
>>> arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Solution
1
2
3
4
>>> arr[::-1]
array([[6, 7, 8],
[3, 4, 5],
[0, 1, 2]])

19. 如何反转 2D 数组的所有列?

English Version

Title: How to reverse the columns of a 2D array?

Difficulty Level: L2

Question: Reverse the columns of a 2D array arr.


难度:L2

问题:反转 2D 数组 arr 中的所有列。

1
2
3
4
5
>>> arr = np.arange(9).reshape(3, 3)
>>> arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Solution
1
2
3
4
>>> arr[:, ::-1]
array([[2, 1, 0],
[5, 4, 3],
[8, 7, 6]])

20. 如何创建一个包含 5 和 10 之间浮点数的随机 2 维数组?

English Version

Title: How to create a 2D array containing random floats between 5 and 10?

Difficulty Level: L2

Question: Create a 2D array of shape 5x3 to contain random decimal numbers between 5 and 10.


难度:L2

问题:创建一个形态为 5×3 的 2 维数组,包含 5 和 10 之间的随机十进制小数。

Solution 1
1
2
3
4
5
6
7
>>> np.random.seed(100)
>>> np.random.uniform(5, 10, size=(5, 3))
array([[7.71702471, 6.39184693, 7.12258795],
[9.22388066, 5.02359428, 5.6078456 ],
[8.35374542, 9.12926378, 5.68353295],
[7.87546665, 9.45660977, 6.04601061],
[5.9266411 , 5.54188445, 6.09848746]])
Solution 2
1
2
3
4
5
6
7
8
>>> np.random.seed(100)
>>> arr = (10 - 5) * np.random.rand(5, 3) + 5
>>> arr
array([[7.71702471, 6.39184693, 7.12258795],
[9.22388066, 5.02359428, 5.6078456 ],
[8.35374542, 9.12926378, 5.68353295],
[7.87546665, 9.45660977, 6.04601061],
[5.9266411 , 5.54188445, 6.09848746]])
Solution 3
1
2
3
4
5
6
7
8
# Maybe different from other solutions
>>> rand_arr = np.random.randint(low=5, high=10, size=(5, 3)) + np.random.random((5, 3))
>>> rand_arr
array([[6.41920093, 9.40003816, 7.78940871],
[7.973373 , 6.51303275, 6.04690216],
[5.26486281, 8.24187676, 9.69046437],
[8.34740798, 7.26776599, 8.26254059],
[8.46680771, 9.86023614, 6.52209887]])

21. 如何在 Python NumPy 数组中仅输出小数点后三位的数字?

English Version

Title: How to print only 3 decimal places in python numpy array?

Difficulty Level: L1

Question: Print or show only 3 decimal places of the numpy array rand_arr.


难度:L1

问题:输出或显示 NumPy 数组 rand_arr 中小数点后三位的数字。

输入:

1
rand_arr = np.random.random((5, 3))
Solution
1
2
3
4
5
6
7
>>> np.set_printoptions(precision=3)
>>> rand_arr
array([[0.152, 0.272, 0.846],
[0.927, 0.521, 0.665],
[0.465, 0.67 , 0.136],
[0.829, 0.175, 0.343],
[0.281, 0.177, 0.596]])

22. 如何通过禁用科学计数法(如 1e10)打印 NumPy 数组?

English Version

Title: How to pretty print a numpy array by suppressing the scientific notation (like 1e10)?

Difficulty Level: L1

Question: Pretty print rand_arr by suppressing the scientific notation (like 1e10).


难度:L1

问题:通过禁用科学计数法(如 1e10)打印 NumPy 数组 rand_arr

输入:

1
2
3
4
5
6
7
# Create the random array
>>> np.random.seed(100)
>>> rand_arr = np.random.random([3, 3]) / 1e3
>>> rand_arr
array([[5.43404942e-04, 2.78369385e-04, 4.24517591e-04],
[8.44776132e-04, 4.71885619e-06, 1.21569121e-04],
[6.70749085e-04, 8.25852755e-04, 1.36706590e-04]])

期望输出:

1
2
3
array([[0.000543, 0.000278, 0.000425],
[0.000845, 0.000005, 0.000122],
[0.000671, 0.000826, 0.000137]])
Solution
1
2
3
4
5
6
# precision is optional
>>> np.set_printoptions(suppress=True, precision=6)
>>> rand_arr
array([[0.000543, 0.000278, 0.000425],
[0.000845, 0.000005, 0.000122],
[0.000671, 0.000826, 0.000137]])

23. 如何限制 NumPy 数组输出中项的数目?

English Version

Title: How to limit the number of items printed in output of numpy array?

Difficulty Level: L1

Question: Limit the number of items printed in python numpy array a to a maximum of 6 elements.


难度:L1

问题:将 Python NumPy 数组 a 输出的项的数目限制在最多 6 个元素。

输入:

1
2
3
>>> a = np.arange(15)
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

期望输出:

1
array([ 0,  1,  2, ..., 12, 13, 14])
Solution
1
2
3
>>> np.set_printoptions(threshold=6)
>>> a
array([ 0, 1, 2, ..., 12, 13, 14])

24. 如何在不截断数组的前提下打印出完整的 NumPy 数组?

English Version

Title: How to print the full numpy array without truncating

Difficulty Level: L1

Question: Print the full numpy array a without truncating.


难度:L1

问题:在不截断数组的前提下打印出完整的 NumPy 数组 a。

输入:

1
2
3
4
>>> np.set_printoptions(threshold=6)
>>> a = np.arange(15)
>>> a
array([ 0, 1, 2, ..., 12, 13, 14])

期望输出:

1
2
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
Solution 1
1
2
3
>>> np.set_printoptions(threshold=np.nan)
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
Solution 2
1
2
3
>>> np.set_printoptions(threshold=1000)
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

25. 如何向 Python NumPy 导入包含数字和文本的数据集,同时保持文本不变?

English Version

Title: How to import a dataset with numbers and texts keeping the text intact in python numpy?

Difficulty Level: L2

Question: Import the iris dataset keeping the text intact.


难度:L2

问题:导入 iris 数据集,保持文本不变。

Iris Data Set 网页下载数据集 iris.data

Solution
1
2
3
4
5
6
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype=object)
>>> iris[:3]
array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
[b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
[b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa']], dtype=object)

Since we want to retain the species, a text field, I have set the dtype to object. Had I set dtype=None, a 1d array of tuples would have been returned.

26. 如何从 1 维元组数组中提取特定的列?

English Version

Title: How to extract a particular column from 1D array of tuples?

Difficulty Level: L2

Question: Extract the text column species from the 1D iris_1d.


难度:L2

问题:从导入的 1 维 iris_1d 中提取文本列 species。

输入:

1
2
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_1d = np.genfromtxt(url, delimiter=",", dtype=None)
Solution 1
1
2
3
4
>>> species = np.array([row[4] for row in iris_1d])
>>> species[:7]
array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
b'Iris-setosa', b'Iris-setosa', b'Iris-setosa'], dtype='|S18')
Solution 2
1
2
3
4
5
>>> vfunc = np.vectorize(lambda x: x[4])
>>> species = vfunc(iris_1d)
>>> species[:7]
array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
b'Iris-setosa', b'Iris-setosa', b'Iris-setosa'], dtype='|S15')

27. 如何将 1 维元组数组转换成 2 维 NumPy 数组?

English Version

Title: How to convert a 1d array of tuples to a 2d numpy array?

Difficulty Level: L2

Question: Convert the 1D iris_1d to 2D array iris_2d by omitting the species text field.


难度:L2

问题:忽略 species 文本字段,将 1 维 iris_1d 转换成 2 维数组 iris_2d

输入:

1
2
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_1d = np.genfromtxt(url, delimiter=",", dtype=None)
Solution
1
2
3
4
5
>>> iris_2d = np.array([row.tolist()[:4] for row in iris_1d])
>>> iris_2d[:3]
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2]])

28. 如何计算 NumPy 数组的平均值、中位数和标准差?

English Version

Title: How to compute the mean, median, standard deviation of a numpy array?

Difficulty: L1

Question: Find the mean, median, standard deviation of iris’s sepal length (1st column).


难度:L1

问题:找出 iris sepal length(第一列)的平均值、中位数和标准差。

1
2
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_1d = np.genfromtxt(url, delimiter=",", dtype=None)
Solution
1
2
3
4
>>> sepal_length = np.array([row[0] for row in iris_1d])
>>> mean, median, std = np.mean(sepal_length), np.median(sepal_length), np.std(sepal_length)
>>> mean, median, std
(5.843333333333334, 5.8, 0.8253012917851409)

29. 如何归一化数组,使值的范围在 0 和 1 之间?

English Version

Title: How to normalize an array so the values range exactly between 0 and 1?

Difficulty: L2

Question: Create a normalized form of iris’s sepal length whose values range exactly between 0 and 1 so that the minimum has value 0 and maximum has value 1.


难度:L2

问题:创建 iris sepal length 的归一化格式,使其值在 0 到 1 之间。

输入:

1
2
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
sepal_length = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0])
Solution
1
2
3
4
5
>>> max_value = np.max(sepal_length)
>>> min_value = np.min(sepal_length)
>>> sepal_length_nm = (sepal_length - min_value) / (max_value - min_value)
>>> sepal_length_nm[:3]
array([0.22222222, 0.16666667, 0.11111111])

30. 如何计算 softmax 分数?

English Version

Title: How to compute the softmax score?

Difficulty Level: L3

Question: Compute the softmax score of sepal length.


难度:L3

问题:计算 sepal length 的 softmax 分数。

1
2
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
sepal_length = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0])
Solution

According formula:

$$
S(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}
$$

1
2
3
4
5
>>> sepal_length_exp = np.exp(sepal_length)
>>> exp_sum = np.sum(sepal_length_exp)
>>> sepal_length_sm = sepal_length_exp / exp_sum
>>> sepal_length_sm[:5]
array([0.00221959, 0.00181724, 0.00148783, 0.00134625, 0.00200836])

For numerical stability, the formula changes to:

$$
S(x_i) = \frac{e^{(x_i - x_{max})}}{\sum_j e^{(x_j - x_{max})}}
$$

where $x_{max} = max(x)$.

1
2
3
4
5
>>> sepal_length_exp = np.exp(sepal_length - np.max(sepal_length))
>>> exp_sum = np.sum(sepal_length_exp)
>>> sepal_length_sm = sepal_length_exp / exp_sum
>>> sepal_length_sm[:5]
array([0.00221959, 0.00181724, 0.00148783, 0.00134625, 0.00200836])

31. 如何找到 NumPy 数组的百分数?

English Version

Title: How to find the percentile scores of a numpy array?

Difficulty Level: L1

Question: Find the 5th and 95th percentile of iris’s sepal length.


难度:L1

问题:找出 iris sepal length(第一列)的第 5 个和第 95 个百分数。

1
2
url =  "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
sepallength = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0])
Solution
1
2
>>> np.percentile(sepallength, [5, 95])
array([4.6 , 7.255])

32. 如何在数组的随机位置插入值?

English Version

Title: How to insert values at random positions in an array?

Difficulty Level: L2

Question: Insert np.nan values at 20 random positions in iris_2d dataset.


难度:L2

问题:在 iris_2d 数据集中的 20 个随机位置插入 np.nan 值。

输入:

1
2
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=object)
Solution 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
>>> rand_row = np.random.randint(iris_2d.shape[0], size=20)
>>> rand_col = np.random.randint(iris_2d.shape[1], size=20)
>>> iris_2d[rand_row, rand_col] = np.nan
>>> iris_2d[:10]
array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
[b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
[b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
[b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
[b'5.0', b'3.6', b'1.4', b'0.2', b'Iris-setosa'],
[b'5.4', b'3.9', b'1.7', b'0.4', b'Iris-setosa'],
[b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
[b'5.0', b'3.4', b'1.5', b'0.2', b'Iris-setosa'],
[b'4.4', b'2.9', nan, b'0.2', b'Iris-setosa'],
[b'4.9', b'3.1', b'1.5', b'0.1', b'Iris-setosa']], dtype=object)
Solution 2
1
2
3
4
5
6
7
8
9
10
11
12
13
>>> i, j = np.where(iris_2d)
>>> iris_2d[np.random.choice(i, 20), np.random.choice(j, 20)] = np.nan
>>> iris_2d[:10]
array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
[b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
[b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
[b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
[b'5.0', b'3.6', b'1.4', b'0.2', b'Iris-setosa'],
[b'5.4', b'3.9', b'1.7', b'0.4', b'Iris-setosa'],
[b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
[b'5.0', b'3.4', b'1.5', b'0.2', nan],
[b'4.4', b'2.9', b'1.4', b'0.2', b'Iris-setosa'],
[b'4.9', b'3.1', b'1.5', b'0.1', b'Iris-setosa']], dtype=object)

33. 如何在 NumPy 数组中找出缺失值的位置?

English Version

Title: How to find the position of missing values in numpy array?

Difficulty Level: L2

Question: Find the number and position of missing values in iris_2d‘s sepal length (1st column).


难度:L2

问题:在 iris_2dsepal length(第一列)中找出缺失值的数目和位置。

输入:

1
2
3
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0, 1, 2, 3])
>>> iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan
Solution 1
1
2
3
4
5
6
7
# number of nan
>>> np.isnan(iris_2d[:, 0]).sum()
5

# index of nan
>>> np.where(np.isnan(iris_2d[:, 0]))
(array([ 12, 13, 47, 53, 143]),)
Solution 2
1
2
3
4
5
6
7
8
9
10
11
12
>>> nan_bools = np.isnan(iris_2d[:, 0])

# number of nan
>>> num_nans = np.sum(nan_bools)
>>> num_nans
5

# index of nan
>>> index = np.arange(len(nan_bools))
>>> nan_index = index[nan_bools]
>>> nan_index
array([ 12, 13, 47, 53, 143])

34. 如何基于两个或以上条件过滤 NumPy 数组?

English Version

Title: How to filter a numpy array based on two or more conditions?

Difficulty Level: L3

Question: Filter the rows of iris_2d that has petal length (3rd column) > 1.5 and sepal length (1st column) < 5.0.


难度:L3

问题:过滤 iris_2d 中满足 petal length(第三列)> 1.5sepal length(第一列)< 5.0 的行。

输入:

1
2
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0, 1, 2, 3])
Solution
1
2
3
4
5
6
7
8
>>> condition = (iris_2d[:, 2] > 1.5) & (iris_2d[:, 0] < 5.0)
>>> iris_2d[condition]
array([[4.8, 3.4, 1.6, 0.2],
[4.8, 3.4, 1.9, 0.2],
[4.7, 3.2, 1.6, 0.2],
[4.8, 3.1, 1.6, 0.2],
[4.9, 2.4, 3.3, 1. ],
[4.9, 2.5, 4.5, 1.7]])

35. 如何在 NumPy 数组中删除包含缺失值的行?

English Version

Title: How to drop rows that contain a missing value from a numpy array?

Difficulty Level: L3:

Question: Select the rows of iris_2d that does not have any nan value.


难度:L3

问题:选择 iris_2d 中不包含 nan 值的行。

输入:

1
2
3
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0, 1, 2, 3])
>>> iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan
Solution 1
1
2
3
4
5
6
>>> iris_2d[np.sum(np.isnan(iris_2d), axis=1) == 0][:5]
array([[5.1, 3.5, 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4]])
Solution 2
1
2
3
4
5
6
7
>>> any_nan_in_row = np.array([~np.any(np.isnan(row)) for row in iris_2d])
>>> iris_2d[any_nan_in_row][:5]
array([[5.1, 3.5, 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4]])

36. 如何找出 NumPy 数组中两列之间的关联性?

English Version

Title: How to find the correlation between two columns of a numpy array?

Difficulty Level: L2

Question: Find the correlation between sepal length(1st column) and petal length(3rd column) in iris_2d.


难度:L2

问题:找出 iris_2dsepal length(第一列)和 petal length(第三列)之间的关联性。

输入:

1
2
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0, 1, 2, 3])
Solution 1
1
2
>>> np.corrcoef(iris_2d[:, 0], iris_2d[:, 2])[0, 1]
0.8717541573048718
Solution 2
1
2
3
4
>>> from scipy.stats.stats import pearsonr
>>> corr, p_value = pearsonr(iris_2d[:, 0], iris_2d[:, 2])
>>> corr
0.8717541573048712

37. 如何确定给定数组是否有空值?

English Version

Title: How to find if a given array has any null values?

Difficulty Level: L2

Question: Find out if iris_2d has any missing values.


难度:L2

问题:确定 iris_2d 是否有缺失值。

输入:

1
2
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0, 1, 2, 3])
Solution 1
1
2
>>> np.sum(np.isnan(iris_2d)) > 0
False
Solution 2
1
2
>>> np.isnan(iris_2d).any()
False

38. 如何在 NumPy 数组中将所有缺失值替换成0?

English Version

Title: How to replace all missing values with 0 in a numpy array?

Difficulty Level: L2

Question: Replace all ccurrences of nan with 0 in numpy array.


难度:L2

问题:在 NumPy 数组中将所有 nan 替换成 0。

输入:

1
2
3
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0, 1, 2, 3])
>>> iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan
Solution
1
>>> iris_2d[np.isnan(iris_2d)] = 0

39. 如何在 NumPy 数组中找出唯一值的数量?

English Version

Title: How to find the count of unique values in a numpy array?

Difficulty Level: L2

Question: Find the unique values and the count of unique values in iris’s species.


难度:L2

问题:在 iris 的 species 列中找出唯一值及其数量。

输入:

1
2
3
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype=object)
>>> names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")
Solution
1
2
3
4
5
6
>>> unique, counts = np.unique(iris[:, 4], return_counts=True)
>>> unique
array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
dtype=object)
>>> counts
array([50, 50, 50])

40. 如何将一个数值转换为一个类别(文本)数组?

English Version

Title: How to convert a numeric to a categorical (text) array?

Difficulty Level: L2

Question: Bin the petal length (3rd) column of iris_2d to form a text array, such that if petal length is:

1
2
3
Less than 3 --> 'small'
3-5 --> 'medium'
>=5 --> 'large'

难度:L2

问题:将 iris_2d 的 petal length(第三列)转换以构建一个文本数组,按如下规则进行转换:

1
2
3
Less than 3 –> 'small'
3-5 –> 'medium'
>=5 –> 'large'

输入:

1
2
3
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype=object)
>>> names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")
Solution 1
1
2
3
4
5
6
7
8
9
10
# Bin petallength 
>>> petal_length_bin = np.digitize(iris[:, 2].astype(float), [0, 3, 5, 10])

# Map it to respective category
>>> label_map = {1: "small", 2: "medium", 3: "large", 4: np.nan}
>>> petal_length_cat = [label_map[x] for x in petal_length_bin]

# View
>>> petal_length_cat[:4]
['small', 'small', 'small', 'small']
Solution 2
1
2
3
4
5
6
7
8
9
>>> petal_length = iris[:, 2].astype(float)
>>> petal_length_cat = np.full(len(petal_length), None,dtype=object)

>>> petal_length_cat[petal_length < 3] = "small"
>>> petal_length_cat[(petal_length >= 3) & (petal_length < 5)] = "medium"
>>> petal_length_cat[petal_length >= 5] = "large"

>>> petal_length_cat[:4]
array(['small', 'small', 'small', 'small'], dtype=object)

41. 如何基于 NumPy 数组现有列创建一个新的列?

English Version

Title: How to create a new column from existing columns of a numpy array?

Difficulty Level: L2

Question: Create a new column for volume in iris_2d, where volume is (pi x petallength x sepal_length^2)/3.


难度:L2

问题:为 iris_2d 中的 volume 列创建一个新的列,volume 指 (pi x petal_length x sepal_length^2)/3

输入:

1
2
3
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=object)
>>> names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")
Solution 1
1
2
3
4
5
6
7
8
9
10
>>> volume = (np.pi * iris_2d[:, 2].astype(float) * (iris_2d[:, 0].astype(float))**2) / 3
>>> out = np.c_[iris_2d, volume]
>>> out[:4]
array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa',
38.13265162927291],
[b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa',
35.200498485922445],
[b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa', 30.0723720777127],
[b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa',
33.238050274980004]], dtype=object)
Solution 2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Compute volume
>>> sepal_length = iris_2d[:, 0].astype('float')
>>> petal_length = iris_2d[:, 2].astype('float')
>>> volume = (np.pi * petal_length * (sepal_length**2))/3

# Introduce new dimension to match iris_2d's
>>> volume = volume[:, np.newaxis]

# Add the new column
>>> out = np.hstack([iris_2d, volume])

# View
>>> out[:4]
array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa',
38.13265162927291],
[b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa',
35.200498485922445],
[b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa', 30.0723720777127],
[b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa',
33.238050274980004]], dtype=object)

42. 如何在 NumPy 中执行概率采样?

English Version

Title: How to do probabilistic sampling in numpy?

Difficulty Level: L3

Question: Randomly sample iris’s species such that setosa is twice the number of versicolor and virginica.


难度:L3

问题:随机采样 iris 数据集中的 species 列,使得 setosa 的数量是 versicolorvirginica 数量的两倍。

1
2
3
# Import iris keeping the text column intact
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
iris = np.genfromtxt(url, delimiter=",", dtype=object)
Solution
1
2
3
4
5
6
7
8
9
10
11
# Get the species column
>>> species = iris[:, 4]

# Probablistic Sampling
>>> np.random.seed(100)
>>> probs = np.r_[np.linspace(0, 0.500, num=50), np.linspace(0.501, 0.750, num=50), np.linspace(0.751, 1.0, num=50)]
>>> index = np.searchsorted(probs, np.random.random(150))
>>> species_out = species[index]
>>> np.unique(species_out, return_counts=True)
(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
dtype=object), array([77, 37, 36]))

43. 如何在多维数组中找到一维的第二最大值?

English Version

Title: How to get the second largest value of an array when grouped by another array?

Difficulty Level: L2

Question: What is the value of second longest petal length of species setosa


难度:L2

问题:在 species setosapetal length 列中找到第二最大值。

输入:

1
2
3
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype=object)
>>> names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")
Solution
1
2
3
4
5
>>> iris_setosa = iris[iris[:, 4] == b"Iris-setosa", :]
>>> petal_len_setosa = iris_setosa[:, 2].astype(float)
>>> second_large = np.sort(np.unique(petal_len_setosa))[-2]
>>> second_large
1.7

44. 如何用给定列将 2 维数组排序?

English Version

Title: How to sort a 2D array by a column

Difficulty Level: L2

Question: Sort the iris dataset based on sepal length column.


难度:L2

问题:基于 sepal length 列将 iris 数据集排序。

输入:

1
2
3
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype=object)
>>> names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")
Solution
1
2
3
4
5
6
7
8
9
10
11
12
13
>>> index = np.argsort(iris[:, 0])
>>> iris_sort = iris[index]
>>> iris_sort[:10]
array([[b'4.3', b'3.0', b'1.1', b'0.1', b'Iris-setosa'],
[b'4.4', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
[b'4.4', b'3.0', b'1.3', b'0.2', b'Iris-setosa'],
[b'4.4', b'2.9', b'1.4', b'0.2', b'Iris-setosa'],
[b'4.5', b'2.3', b'1.3', b'0.3', b'Iris-setosa'],
[b'4.6', b'3.6', b'1.0', b'0.2', b'Iris-setosa'],
[b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
[b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
[b'4.6', b'3.2', b'1.4', b'0.2', b'Iris-setosa'],
[b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa']], dtype=object)

45. 如何在 NumPy 数组中找到最频繁出现的值?

English Version

Title: How to find the most frequent value in a numpy array?

Difficulty Level: L1

Question: Find the most frequent value of petal length (3rd column) in iris dataset.


难度:L1

问题:在 iris 数据集中找到 petal length(第三列)中最频繁出现的值。

输入:

1
2
3
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype= object)
>>> names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")
Solution
1
2
3
>>> uniques, counts = np.unique(iris[:, 2], return_counts=True)
>>> uniques[np.argmax(counts)]
b'1.5'

46. 如何找到第一个大于给定值的数的位置?

English Version

Title: How to find the position of the first occurrence of a value greater than a given value?

Difficulty Level: L2

Question: Find the position of the first occurrence of a value greater than 1.0 in petal width 4th column of iris dataset.


难度:L2

问题:在 iris 数据集的 petal width(第四列)中找到第一个值大于 1.0 的数的位置。

输入:

1
2
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype=object)
Solution 1
1
2
3
4
>>> np.argwhere(iris[:, 3].astype(float) > 1.0)[0][0]
50
>>> np.where(iris[:, 3].astype(float) > 1.0)[0][0]
50
Solution 2
1
2
3
4
>>> index = np.arange(len(iris))
>>> index = index[iris[:, 3].astype(float) > 1.0]
>>> index[0]
50

47. 如何将数组中所有大于给定值的数替换为给定的 cutoff 值?

English Version

Title: How to replace all values greater than a given value to a given cutoff?

Difficulty Level: L2

Question: From the array a, replace all values greater than 30 to 30 and less than 10 to 10.


难度:L2

问题:对于数组 a,将所有大于 30 的值替换为 30,将所有小于 10 的值替换为 10。

输入:

1
2
>>> np.random.seed(100)
>>> a = np.random.uniform(1, 50, 20)
Solution 1
1
2
3
4
5
# Cutoff in-place
>>> a[a > 30] = 30
>>> a[a < 10] = 10
>>> a[:5]
array([27.62684215, 14.64009987, 21.80136195, 30. , 10. ])
Solution 2
1
2
3
>>> a_cutoff = np.clip(a, a_min=10, a_max=30)
>>> a_cutoff[:5]
array([27.62684215, 14.64009987, 21.80136195, 30. , 10. ])
Solution 3
1
2
3
>>> a_cutoff = np.where(a < 10, 10, np.where(a > 30, 30, a))
>>> a_cutoff[:5]
array([27.62684215, 14.64009987, 21.80136195, 30. , 10. ])

48. 如何在 NumPy 数组中找到 top-n 数值的位置?

English Version

Title: How to get the positions of top n values from a numpy array?

Difficulty Level: L2

Question: Get the positions of top 5 maximum values in a given array a.


难度:L2

问题:在给定数组 a 中找到 top-5 最大值的位置。

输入:

1
2
3
4
5
6
7
>>> np.random.seed(100)
>>> a = np.random.uniform(1, 50, 20)
>>> a
array([27.62684215, 14.64009987, 21.80136195, 42.39403048, 1.23122395,
6.95688692, 33.86670515, 41.466785 , 7.69862289, 29.17957314,
44.67477576, 11.25090398, 10.08108276, 6.31046763, 11.76517714,
48.95256545, 40.77247431, 9.42510962, 40.99501269, 14.42961361])
Solution 1
1
2
3
>>> index = np.argsort(a)[::-1]
>>> index[:5]
array([15, 10, 3, 7, 18])
Solution 2
1
2
3
4
# Assume each element in array `a` is nonnegative
>>> index = np.argpartition(-a, 5)
>>> index[:5]
array([15, 10, 3, 7, 18])

49. 如何逐行计算数组中所有值的数量?

English Version

Title: How to compute the row wise counts of all possible values in an array?

Difficulty Level: L4

Question: Compute the counts of unique values row-wise.


难度:L4

问题:逐行计算唯一值的数量。

输入:

1
2
3
4
5
6
7
8
9
>>> np.random.seed(100)
>>> arr = np.random.randint(1, 11, size=(6, 10))
>>> arr
array([[ 9, 9, 4, 8, 8, 1, 5, 3, 6, 3],
[ 3, 3, 2, 1, 9, 5, 1, 10, 7, 3],
[ 5, 2, 6, 4, 5, 5, 4, 8, 2, 2],
[ 8, 8, 1, 3, 10, 10, 4, 3, 6, 9],
[ 2, 1, 8, 7, 3, 1, 9, 3, 6, 2],
[ 9, 2, 6, 5, 3, 9, 4, 6, 1, 10]])

期望输出:

1
2
3
4
5
6
[[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
[2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
[0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
[1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
[2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]

输出包含 10 个列,表示从 1 到 10 的数字。这些数值分别代表每一行的计数数量。例如,Cell(0, 2) 中有值 2,这意味着,数字 3 在第一行出现了两次。

Solution 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Assume each number is in [1, 10]
>>> results = []
>>> for row in arr:
... uniques, counts = np.unique(row, return_counts=True)
... zeros = np.zeros(10, dtype=int)
... zeros[uniques-1] = counts
... results.append(zeros.tolist())
...
>>> np.array(results)
array([[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
[2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
[0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
[1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
[2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 2, 0, 0, 2, 1]])
Solution 2
1
2
3
4
5
6
7
8
9
10
11
12
# More general
>>> def counts_of_all_values_rowwise(arr2d):
... # Unique values and its counts row wise
... return([[int(b[a==i]) if i in a else 0 for i in np.unique(arr2d)] for a, b in num_counts_array])
...
>>> np.array(counts_of_all_values_rowwise(arr))
array([[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
[2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
[0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
[1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
[2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 2, 0, 0, 2, 1]])

50. 如何将 array_of_arrays 转换为平面 1 维数组?

English Version

Title: How to convert an array of arrays into a flat 1d array?

Difficulty Level: 2

Question: Convert array_of_arrays into a flat linear 1d array.


难度:L2

问题:将 array_of_arrays 转换为平面线性 1 维数组。

输入:

1
2
3
4
5
6
7
>>> arr1 = np.arange(3)
>>> arr2 = np.arange(3, 7)
>>> arr3 = np.arange(7, 10)
>>> array_of_arrays = np.array([arr1, arr2, arr3])
>>> array_of_arrays
array([array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])],
dtype=object)

期望输出:

1
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Solution 1
1
2
3
>>> arr2d = np.concatenate([arr for arr in array_of_arrays])
>>> arr2d
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Solution 2
1
2
3
>>> arr2d = np.array([a for arr in array_of_arrays for a in arr])
>>> arr2d
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

51. 如何为 NumPy 数组生成 one-hot 编码?

English Version

Title: How to generate one-hot encodings for an array in numpy?

Difficulty Level L4

Question: Compute the one-hot encodings (dummy binary variables for each unique value in the array).


难度:L4

问题:计算 one-hot 编码。

输入:

1
2
3
4
>>> np.random.seed(101)
>>> arr = np.random.randint(1, 4, size=6)
>>> arr
array([2, 3, 2, 2, 2, 1])

期望输出:

1
2
3
4
5
6
array([[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.]])
Solution 1
1
2
3
4
5
6
7
8
9
>>> arr_shift = arr - 1
>>> one_hot = np.eye(3)[arr_shift]
>>> one_hot
array([[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.]])
Solution 2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
>>> def one_hot_encodings(arr):
... uniqs = np.unique(arr)
... out = np.zeros((arr.shape[0], uniqs.shape[0]))
... for i, k in enumerate(arr):
... out[i, k-1] = 1
... return out
...
>>> one_hot_encodings(arr)
array([[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.]])

52. 如何创建由类别变量分组确定的一维数值?

English Version

Title: How to create row numbers grouped by a categorical variable?

Difficulty Level: L3

Question: Create row numbers grouped by a categorical variable. Use the following sample from iris species as input.


难度:L3

问题:创建由类别变量分组的行数。使用以下来自 iris species 的样本作为输入。

输入:

1
2
3
4
5
6
7
8
9
10
11
12
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> species = np.genfromtxt(url, delimiter=",", dtype=str, usecols=4)
>>> np.random.seed(100)
>>> species_small = np.sort(np.random.choice(species, size=20))
>>> species_small
array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
'Iris-virginica'], dtype='<U15')

期望输出:

1
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5]
Solution 1
1
2
3
4
5
6
>>> groups = []
>>> for val in np.unique(species_small):
... groups.append(np.arange(len(species_small[species_small == val])))
...
>>> np.concatenate(groups).tolist()
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5]
Solution 2
1
2
>>> [i for val in np.unique(species_small) for i, grp in enumerate(species_small[species_small==val])]
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5]

53. 如何基于给定的类别变量创建分组 id?

English Version

Title: How to create groud ids based on a given categorical variable?

Difficulty Level: L4

Question: Create group ids based on a given categorical variable. Use the following sample from iris species as input.


难度:L4

问题:基于给定的类别变量创建分组 id。使用以下来自 iris species 的样本作为输入。

输入:

1
2
3
4
5
6
7
8
9
10
11
12
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> species = np.genfromtxt(url, delimiter=",", dtype=str, usecols=4)
>>> np.random.seed(100)
>>> species_small = np.sort(np.random.choice(species, size=20))
>>> species_small
array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
'Iris-virginica'], dtype='<U15')

期望输出:

1
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]
Solution
1
2
3
4
5
6
7
8
>>> output = np.full(len(species_small), 0)
>>> uniques = np.unique(species_small)
>>> for val in uniques:
... group_id = np.where(uniques == val)[0][0]
... output[species_small == val] = group_id
...
>>> output.tolist()
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]

54. 如何使用 NumPy 对数组中的项进行排序?

English Version

Title: How to rank items in an array using numpy?

Difficulty Level: L2

Question: Create the ranks for the given numeric array a.


难度:L2

问题:为给定的数值数组 a 创建排序。

输入:

1
2
3
4
>>> np.random.seed(10)
>>> a = np.random.randint(20, size=10)
>>> a
array([ 9, 4, 15, 0, 17, 16, 17, 8, 9, 0])

期望输出:

1
array([4, 2, 6, 0, 8, 7, 9, 3, 5, 1])
Solution
1
2
>>> np.argsort(np.argsort(a))
array([4, 2, 6, 0, 8, 7, 9, 3, 5, 1])

55. 如何使用 NumPy 对多维数组中的项进行排序?

English Version

Title: How to rank items in a multidimensional array using numpy?

Difficulty Level: L3

Question: Create a rank array of the same shape as a given numeric array a.


难度:L3

问题:给出一个数值数组 a,创建一个形态相同的排序数组。

输入:

1
2
3
4
5
>>> np.random.seed(10)
>>> a = np.random.randint(20, size=[2, 5])
>>> a
array([[ 9, 4, 15, 0, 17],
[16, 17, 8, 9, 0]])

期望输出:

1
2
array([[4, 2, 6, 0, 8],
[7, 9, 3, 5, 1]])
Solution 1
1
2
3
4
5
>>> a_flat = a.flatten()
>>> sort_idx = np.argsort(np.argsort(a_flat))
>>> sort_idx.reshape((2, -1))
array([[4, 2, 6, 0, 8],
[7, 9, 3, 5, 1]])
Solution 2
1
2
3
>>> a.ravel().argsort().argsort().reshape(a.shape)
array([[4, 2, 6, 0, 8],
[7, 9, 3, 5, 1]])

56. 如何在 2 维 NumPy 数组中找到每一行的最大值?

English Version

Title: How to find the maximum value in each row of a numpy array 2d?

Difficulty Level: L2

Question: Compute the maximum for each row in the given array.


难度:L2

问题:在给定数组中找到每一行的最大值。

1
2
3
4
5
6
7
8
>>> np.random.seed(100)
>>> a = np.random.randint(1, 10, [5, 3])
>>> a
array([[9, 9, 4],
[8, 8, 1],
[5, 3, 6],
[3, 3, 3],
[2, 1, 9]])
Solution 1
1
2
>>> np.amax(a, axis=1)
array([9, 8, 6, 3, 9])
Solution 2
1
2
>>> np.apply_along_axis(np.max, arr=a, axis=1)
array([9, 8, 6, 3, 9])

57. 如何计算 2 维 NumPy 数组每一行的 min-by-max?

English Version

Title: How to compute the min-by-max for each row for a numpy array 2d?

Difficulty Level: L3

Question: Compute the min-by-max for each row for given 2d numpy array.


难度:L3

问题:给定一个 2 维 NumPy 数组,计算每一行的 min-by-max。

1
2
3
4
5
6
7
8
>>> np.random.seed(100)
>>> a = np.random.randint(1, 10, [5, 3])
>>> a
array([[9, 9, 4],
[8, 8, 1],
[5, 3, 6],
[3, 3, 3],
[2, 1, 9]])
Solution
1
2
>>> np.apply_along_axis(lambda x: np.min(x)/np.max(x), axis=1, arr=a)
array([0.44444444, 0.125 , 0.5 , 1. , 0.11111111])

58. 如何在 NumPy 数组中找到重复条目?

English Version

Title: How to find the duplicate records in a numpy array?

Difficulty Level: L3

Question: Find the duplicate entries (2nd occurrence onwards) in the given numpy array and mark them as True. First time occurrences should be False.


难度:L3

问题:在给定的 NumPy 数组中找到重复条目(从第二次出现开始),并将其标记为 True。第一次出现的条目需要标记为 False

输入:

1
2
3
4
>>> np.random.seed(100)
>>> a = np.random.randint(0, 5, 10)
>>> a
array([0, 0, 3, 0, 2, 4, 2, 2, 2, 2])

期望输出:

1
2
array([False,  True, False,  True, False, False,  True,  True,  True,
True])
Solution
1
2
3
4
5
6
>>> out = np.full(a.shape[0], True)
>>> unique_positions = np.unique(a, return_index=True)[1]
>>> out[unique_positions] = False
>>> out
array([False, True, False, True, False, False, True, True, True,
True])

59. 如何找到 NumPy 的分组平均值?

English Version

Title: How to find the grouped mean in numpy?

Difficulty Level L3

Question: Find the mean of a numeric column grouped by a categorical column in a 2D numpy array.


难度:L3

问题:在 2 维 NumPy 数组的类别列中找到数值 sepal length 的平均值。

输入:

1
2
3
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
iris = np.genfromtxt(url, delimiter=",", dtype=object)
names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")

期望输出:

1
2
3
[[b'Iris-setosa', 3.418],
[b'Iris-versicolor', 2.770],
[b'Iris-virginica', 2.974]]
Solution
1
2
3
4
5
6
7
8
>>> uniques = np.unique(iris[:, 4])
>>> output = []
>>> for v in uniques:
... group = iris[iris[:, 4] == v]
... output.append([v, np.mean(group[:, 1].astype(float))])
...
>>> output
[[b'Iris-setosa', 3.418], [b'Iris-versicolor', 2.7700000000000005], [b'Iris-virginica', 2.974]]

60. 如何将 PIL 图像转换成 NumPy 数组?

English Version

Title: How to convert a PIL image to numpy array?

Difficulty Level: L3

Question: Import the image from the following url and convert it to a numpy array.


难度:L3

问题:从以下 url 中导入图像,并将其转换成 NumPy 数组。

1
>>> url = "https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg"
Solution
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
>>> import requests
>>> from io import BytesIO
>>> from PIL import Image
>>> response = requests.get(url)
>>> img = Image.open(BytesIO(response.content))
>>> img_arr = np.asarray(img)
>>> img_arr[:5, :5]
array([[[ 9, 72, 125],
[ 9, 72, 125],
[ 9, 72, 125],
[ 10, 73, 126],
[ 10, 73, 126]],

[[ 9, 72, 125],
[ 9, 72, 125],
[ 10, 73, 126],
[ 10, 73, 126],
[ 10, 73, 126]],

[[ 9, 72, 125],
[ 10, 73, 126],
[ 10, 73, 126],
[ 10, 73, 126],
[ 11, 74, 127]],

[[ 10, 73, 126],
[ 10, 73, 126],
[ 10, 73, 126],
[ 11, 74, 127],
[ 11, 74, 127]],

[[ 10, 73, 126],
[ 10, 73, 126],
[ 11, 74, 127],
[ 11, 74, 127],
[ 11, 74, 127]]], dtype=uint8)

61. 如何删除 NumPy 数组中所有的缺失值?

English Version

Title: How to drop all missing values from a numpy array?

Difficulty Level: L2

Question: Drop all nan values from a 1D numpy array.


难度:L2

问题:从 1 维 NumPy 数组中删除所有的 nan 值。

输入:

1
2
3
>>> arr = np.array([1, 2, 3, np.nan, 5, 6, 7, np.nan])
>>> arr
array([ 1., 2., 3., nan, 5., 6., 7., nan])

期望输出:

1
array([1., 2., 3., 5., 6., 7.])
Solution
1
2
>>> arr[~np.isnan(arr)]
array([1., 2., 3., 5., 6., 7.])

62. 如何计算两个数组之间的欧几里得距离?

English Version

Title: How to compute the euclidean distance between two arrays?

Difficulty Level: L1

Question: Compute the euclidean distance between two arrays a and b.


难度:L1

问题:计算两个数组 ab 之间的欧几里得距离。

输入:

1
2
3
4
5
6
>>> a = np.array([1, 2, 3, 4, 5])
>>> b = np.array([4, 5, 6, 7, 8])
>>> a
array([1, 2, 3, 4, 5])
>>> b
array([4, 5, 6, 7, 8])
Solution 1
1
2
>>> np.sqrt(np.sum((a-b)**2))
6.708203932499369
Solution 2
1
2
>>> np.linalg.norm(a-b)
6.708203932499369

63. 如何在一个 1 维数组中找到所有的局部极大值(peak)?

English Version

Title: How to find all the local maxima (or peaks) in a 1d array?

Difficulty Level: L4

Question: Find all the peaks in a 1D numpy array a. Peaks are points surrounded by smaller values on both sides.


难度:L4

问题:在 1 维数组 a 中找到所有的 peak。peak 是指一个数字比两侧的数字都大。

输入:

1
2
3
>>> a = np.array([1, 3, 7, 1, 2, 6, 0, 1])
>>> a
array([1, 3, 7, 1, 2, 6, 0, 1])

期望输出:

1
array([2, 5])

其中 2 和 5 是局部最大值 7 和 6 的下标。

Solution
1
2
3
4
>>> double_diff = np.diff(np.sign(np.diff(a)))
>>> peak_locations = np.where(double_diff == -2)[0] + 1
>>> peak_locations
array([2, 5])

64. 如何从 2 维数组中减去 1 维数组,从 2 维数组的每一行分别减去 1 维数组的每一项?

English Version

Title: How to subtract a 1d array from a 2d array, where each item of 1d array subtracts from respective row?

Difficulty Level: L2

Question: Subtract the 1d array b_1d from the 2d array a_2d, such that each item of b_1d subtracts from respective row of a_2d.


难度:L2

问题:从 2 维数组 a_2d 中减去 1 维数组 b_1d,即从 a_2d 的每一行分别减去 b_1d 的每一项。

输入:

1
2
3
4
5
6
7
8
>>> a_2d = np.array([[3, 3, 3],[4, 4, 4],[5, 5, 5]])
>>> b_1d = np.array([1, 2, 3])
>>> a_2d
array([[3, 3, 3],
[4, 4, 4],
[5, 5, 5]])
>>> b_1d
array([1, 2, 3])

期望输出:

1
2
3
array([[2, 2, 2],
[2, 2, 2],
[2, 2, 2]])
Solution
1
2
3
4
>>> a_2d - b_1d[:, np.newaxis]
array([[2, 2, 2],
[2, 2, 2],
[2, 2, 2]])

65. 如何在数组中找出某个项的第 n 个重复索引?

English Version

Title: How to find the index of n’th repetition of an item in an array

Difficulty Level L2

Question: Find the index of 5th repetition of number 1 in x.

难度:L2

问题:找到数组 x 中数字 1 的第 5 个重复索引。

输入:

1
>>> x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])
Solution 1
1
2
3
>>> n = 5
>>> [i for i, v in enumerate(x) if v == 1][n-1]
8
Solution 2
1
2
3
4
>>> n = 5
>>> index = np.arange(len(x))
>>> index[x == 1][n-1]
8
Solution 3
1
2
3
>>> n = 5
>>> np.where(x == 1)[0][n-1]
8

66. 如何将 NumPy 的 datetime64 对象(object)转换为 datetime 的 datetime 对象?

English Version

Title: How to convert numpy’s datetime64 object to datetime’s datetime object?

Difficulty Level: L2

Question: Convert numpy’s datetime64 object to datetime’s datetime object.


难度:L2

问题:将 NumPy 的 datetime64 对象转换为 datetime 的 datetime 对象。

1
2
# Input: a numpy datetime64 object
>>> dt64 = np.datetime64("2018-02-25 22:10:10")
Solution 1
1
2
>>> dt64.tolist()
datetime.datetime(2018, 2, 25, 22, 10, 10)
Solution 2
1
2
3
>>> from datetime import datetime
>>> dt64.astype(datetime)
datetime.datetime(2018, 2, 25, 22, 10, 10)

67. 如何计算 NumPy 数组的移动平均数?

English Version

Title: How to compute the moving average of a numpy array?

Difficulty Level: L3

Question: Compute the moving average of window size 3, for the given 1D array.


难度:L3

问题:给定 1 维数组,计算 window size 为 3 的移动平均数。

输入:

1
2
3
4
>>> np.random.seed(100)
>>> Z = np.random.randint(10, size=10)
>>> Z
array([8, 8, 3, 7, 7, 0, 4, 2, 5, 2])
Solution 1

Source: How to calculate moving average using NumPy?

1
2
3
4
5
6
7
>>> def moving_average(a, n=3):
... ret = np.cumsum(a, dtype=float)
... ret[n:] = ret[n:] - ret[:-n]
... return ret[n-1:] / n
...
>>> moving_average(Z, n=3).round(2)
array([6.33, 6. , 5.67, 4.67, 3.67, 2. , 3.67, 3. ])
Solution 2
1
2
>>> np.convolve(Z, np.ones(3)/3, mode="valid").round(2)
array([6.33, 6. , 5.67, 4.67, 3.67, 2. , 3.67, 3. ])

68. 给定起始数字、length 和步长,如何创建一个 NumPy 数组序列?

English Version

Title: How to create a numpy array sequence given only the starting point, length and the step?

Difficulty Level: L2

Question: Create a numpy array of length 10, starting from 5 and has a step of 3 between consecutive numbers.


难度:L2

问题:从 5 开始,创建一个 length 为 10 的 NumPy 数组,相邻数字的差是 3。

Solution 1
1
2
3
4
5
6
>>> def seq(start, length, step):
... end = start + (step*length)
... return np.arange(start, end, step)
...
>>> seq(5, 10, 3)
array([ 5, 8, 11, 14, 17, 20, 23, 26, 29, 32])
Solution 2
1
2
>>> np.arange(5, 5+3*10, 3)
array([ 5, 8, 11, 14, 17, 20, 23, 26, 29, 32])

69. 如何在不规则 NumPy 日期序列中填充缺失日期?

English Version

Title: How to fill in missing dates in an irregular series of numpy dates?

Difficulty Level: L3

Question: Given an array of a non-continuous sequence of dates. Make it a continuous sequence of dates, by filling in the missing dates.

难度:L3

问题:给定一个非连续日期序列的数组,通过填充缺失的日期,使其变成连续的日期序列。

输入:

1
2
3
4
5
6
>>> dates = np.arange(np.datetime64("2018-02-01"), np.datetime64("2018-02-25"), 2)
>>> dates
array(['2018-02-01', '2018-02-03', '2018-02-05', '2018-02-07',
'2018-02-09', '2018-02-11', '2018-02-13', '2018-02-15',
'2018-02-17', '2018-02-19', '2018-02-21', '2018-02-23'],
dtype='datetime64[D]')
Solution 1
1
2
3
4
5
6
7
8
9
10
11
12
13
>>> out = []
>>> for date, d in zip(dates, np.diff(dates)):
... out.append(np.arange(date, (date+d)))
...
>>> filled_in = np.array(out).reshape(-1)
>>> output = np.hstack([filled_in, dates[-1]])
>>> output
array(['2018-02-01', '2018-02-02', '2018-02-03', '2018-02-04',
'2018-02-05', '2018-02-06', '2018-02-07', '2018-02-08',
'2018-02-09', '2018-02-10', '2018-02-11', '2018-02-12',
'2018-02-13', '2018-02-14', '2018-02-15', '2018-02-16',
'2018-02-17', '2018-02-18', '2018-02-19', '2018-02-20',
'2018-02-21', '2018-02-22', '2018-02-23'], dtype='datetime64[D]')
Solution 2
1
2
3
4
5
6
7
8
9
>>> filled_in = np.array([np.arange(date, (date+d)) for date, d in zip(dates, np.diff(dates))]).reshape(-1)
>>> output = np.hstack([filled_in, dates[-1]])
>>> output
array(['2018-02-01', '2018-02-02', '2018-02-03', '2018-02-04',
'2018-02-05', '2018-02-06', '2018-02-07', '2018-02-08',
'2018-02-09', '2018-02-10', '2018-02-11', '2018-02-12',
'2018-02-13', '2018-02-14', '2018-02-15', '2018-02-16',
'2018-02-17', '2018-02-18', '2018-02-19', '2018-02-20',
'2018-02-21', '2018-02-22', '2018-02-23'], dtype='datetime64[D]')

70. 如何基于给定的 1 维数组创建 strides?

English Version

Title: How to create strides from a given 1D array?

Difficulty Level: L4

Question: From the given 1d array arr, generate a 2d matrix using strides, with a window length of 4 and strides of 2, like [[0,1,2,3], [2,3,4,5], [4,5,6,7]..]


难度:L4

问题:给定 1 维数组 arr,使用 strides 生成一个 2 维矩阵,其中 window length 等于 4,strides 等于 2,例如 [[0,1,2,3], [2,3,4,5], [4,5,6,7]..]。

输入:

1
2
3
>>> arr = np.arange(15)
>>> arr
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

期望输出:

1
2
3
4
5
6
array([[ 0,  1,  2,  3],
[ 2, 3, 4, 5],
[ 4, 5, 6, 7],
[ 6, 7, 8, 9],
[ 8, 9, 10, 11],
[10, 11, 12, 13]])
Solution
1
2
3
4
5
6
7
8
9
10
11
>>> def gen_strides(a, stride_len=5, window_len=5):
... n_strides = ((a.size - window_len) // stride_len) + 1
... return np.array([a[s:(s+window_len)] for s in np.arange(0, n_strides*stride_len, stride_len)])
...
>>> gen_strides(np.arange(15), stride_len=2, window_len=4)
array([[ 0, 1, 2, 3],
[ 2, 3, 4, 5],
[ 4, 5, 6, 7],
[ 6, 7, 8, 9],
[ 8, 9, 10, 11],
[10, 11, 12, 13]])

References

  1. 101 NumPy Exercises for Data Analysis (Python)
  2. 70 道 NumPy 测试题

要实现一个可遍历的容器,需要两步:

  1. 容器继承collection.abc.Iterable,然后实现__iter__()方法
  2. 实现一个Iterator类,继承自collection.abc.Iterator,然后实现__next__()方法

实现代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
from collections.abc import Iterator, Iterable
from typing import Any


class ConcreteIterator(Iterator):
def __init__(self, collection):
self._collection = collection
self._position = 0

def __next__(self):
try:
value = self._collection._container[self._position]
self._position = self._position + 1
except IndexError:
raise StopIteration()

return value


class ConcreteCollection(Iterable):
def __init__(self):
self._container = []

def __iter__(self):
return ConcreteIterator(self)

def add_item(self, item: Any):
self._container.append(item)


if __name__ == "__main__":
collection = ConcreteCollection()
collection.add_item('Hello')
collection.add_item('Wolrd,')
collection.add_item('Python.')

for item in collection:
print('{} '.format(item), end='')
print('')

介绍

ConfigSpace是一个用于管理算法参数空间的Python包,主要用于算法参数选择任务。一些AutoML库,例如SMAC3BOHB以及auto-sklearn,会用到该包。项目主页为:https://github.com/automl/ConfigSpace

注明:本文章相关代码在Gist

初始化

使用ConfigSpace包时通常要创建一个参数空间实例

1
2
3
4
import ConfigSpace as CS
import ConfigSpace.hyperparameters as CSH

cs = CS.ConfigurationSpace()

这个参数空间集合实例cs包含所有参数的设置

整数参数和浮点参数

本节开始将介绍如何配置算法的参数空间,这里举例的算法为SVM分类算法,算法具体实现为sklearn.svm.SVC。由SVC类介绍可以知道两个参数:

  1. C为惩罚参数,数据类型为浮点数,且$C \ge 0$
  2. max_iter为最大迭代次数,数据类型为整数

假设要限定C的取值范围为$[0, 1]$以及max_iter的取值范围为$[10, 100]$,可以用UniformFloatHyperparameterUniformIntegerHyperparameter设定参数范围

1
2
param_c = CSH.UniformFloatHyperparameter(name='C', lower=0, upper=1)
param_max_iter = CSH.UniformIntegerHyperparameter(name='max_iter', lower=10, upper=100)

设定完参数空间后,需要添加到参数空间集合实例cs

1
2
cs.add_hyperparameter(param_c)
cs.add_hyperparameter(param_max_iter)

此时可以使用cssample_configuration方法进行采样获得一组随机的参数

1
cs.sample_configuration()

此时输出类似下面这种情况

1
2
3
Configuration:
C, Value: 0.7114185317566737
max_iter, Value: 84

Categorical参数和参数之间的联系

sklearn.svm.SVC类介绍可知,算法核类型由参数kernel控制

  • kernel限定算法的核类型,取值主要有'linear''poly''rbf''sigmoid'

此时可以用CategoricalHyperparameter来代表参数kernel

1
2
3
param_kernel = CSH.CategoricalHyperparameter(name='kernel', choices=['linear', 'poly', 'rbf', 'sigmoid'])

cs.add_hyperparameter(param_kernel)

每一种核还有相应的参数设置(设定SVC类对应的参数),即

  • Linear核$K(x, y)=x^Ty$,无参数
  • Poly核$K(x, y)=(\gamma x^Ty + r)^d$,其中参数$\gamma$对应gamma,参数$r$对应coef0,参数$d$对应degree
  • RBF核$K(x, y)=\exp(-\gamma \Vert x - y\Vert^2)$,其中参数$\gamma$对应gamma
  • Sigmoid核$K(x, y)=\tanh(\gamma x^T y + r)$,其中参数$\gamma$对应gamma,参数$r$对应coef0

首先创建参数degreecoef0以及gamma的参数空间

1
2
3
4
5
param_degree = CSH.UniformIntegerHyperparameter(name='degree', lower=2, upper=4)
param_coef0 = CSH.UniformFloatHyperparameter(name='coef0', lower=0, upper=1)
param_gamma = CSH.UniformFloatHyperparameter(name='gamma', lower=1e-5, upper=1e2)

cs.add_hyperparameters([param_degree, param_coef0, param_gamma])

有前面的描述可以知道不同的核对应不同的参数,也就是说核参数和核类型参数之间是由关联的

  • 参数degree关联Poly核
  • 参数coef0关联Poly核和Sigmoid核
  • 参数gamma关联Poly核、RBF核和Sigmoid核

要想表示这种参数之间的关系,可以使用EqualsCondition以及OrConjunction,即

1
2
3
4
5
6
7
8
cond1 = CS.EqualsCondition(param_degree, param_kernel, 'poly')
cond2 = CS.OrConjunction(CS.EqualsCondition(param_coef0, param_kernel, 'poly'),
CS.EqualsCondition(param_coef0, param_kernel, 'sigmoid'))
cond3 = CS.OrConjunction(CS.EqualsCondition(param_gamma, param_kernel, 'rbf'),
CS.EqualsCondition(param_gamma, param_kernel, 'poly'),
CS.EqualsCondition(param_gamma, param_kernel, 'sigmoid'))

cs.add_conditions([cond1, cond2, cond3])

其中

1
CS.EqualsCondition(param_degree, param_kernel, 'poly')

意思为参数kernel值为'poly'时,设定参数degree的值。如果有多个条件,需要用OrConjunction来OR这些条件

1
2
cond2 = CS.OrConjunction(CS.EqualsCondition(param_coef0, param_kernel, 'poly'),
CS.EqualsCondition(param_coef0, param_kernel, 'sigmoid'))

意思为当参数kernel值为'poly'时,设定参数coef0值,或者当参数kernel值为'sigmoid'时,设定参数coef0值。

禁止参数取值组合出现

前面我们设定了sklearn.svm.SVC类某些参数的参数空间,假如SVC的核选择的是Linear核,即参数kernel取值为'Linear',此时SVM变成了LinearSVM。如果SVC类的LinearSVM实现为sklearn.svm.LinearSVC,这时可以用LinearSVC类参数进一步控制算法的运行过程。
注:这里只是假设一种情况,即SVC类有LinearSVC类的全部参数,真实情况是SVC类并没有LinearSVC类的全部参数。

LinearSVC类部分参数如下

  • penalty设置正则项类型,数据类型为字符串,取值为'l1'或者'l2'
  • loss设置损失函数类型,数据类型为字符串,取值为'hinge'或者'squared_hinge'
  • dual设置算法是否求解对偶问题,数据类型为布尔值,实际可以替换成字符串类型

首先根据这三个参数设置参数空间

1
2
3
4
5
param_penalty = CSH.CategoricalHyperparameter(name='penalty', choices=['l1', 'l2'], default_value='l2')
param_loss = CSH.CategoricalHyperparameter(name='loss', choices=['hinge', 'squared_hinge'], default_value='squared_hinge')
param_dual = CSH.CategoricalHyperparameter(name='dual', choices=['True','False'], default_value='False')

cs.add_hyperparameters([param_penalty, param_loss, param_dual])

当核类型为Linear核时,这三个参数才会被设置,因此要进行参数关联

1
2
3
4
5
cond1 = CS.EqualsCondition(param_penalty, param_kernel, 'linear')
cond2 = CS.EqualsCondition(param_loss, param_kernel, 'linear')
cond3 = CS.EqualsCondition(param_dual, param_kernel, 'linear')

cs.add_conditions([cond1, cond2, cond3])

这里限定一些参数组合不能出现

  • 参数penalty取值'l1',参数loss取值'hinge'
  • 参数dual取值'False',参数penalty取值'l2',参数loss取值'hinge'
  • 参数dual取值'False',参数'penalty'取值'l1'

要禁止出现某些参数组合,可以使用ForbiddenEqualsClause,如果有多个组合,需要使用ForbiddenAndConjunction进行OR

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
penalty_loss = CS.ForbiddenAndConjunction(
CS.ForbiddenEqualsClause(param_penalty, 'l2'),
CS.ForbiddenEqualsClause(param_loss, 'hinge')
)
dual_penalty_loss = CS.ForbiddenAndConjunction(
CS.ForbiddenEqualsClause(param_dual, 'False'),
CS.ForbiddenEqualsClause(param_penalty, 'l2'),
CS.ForbiddenEqualsClause(param_loss, 'hinge')
)

penalty_dual = CS.ForbiddenAndConjunction(
CS.ForbiddenEqualsClause(param_dual, 'False'),
CS.ForbiddenEqualsClause(param_penalty, 'l1')
)

cs.add_forbidden_clause(penalty_loss)
cs.add_forbidden_clause(dual_penalty_loss)
cs.add_forbidden_clause(penalty_dual)

安装桌面环境和VNC服务端

首先更新包列表

1
$ sudo apt-get update

安装桌面环境Xfce

1
$ sudo apt-get install xfce4 xfce4-goodies

安装VNC服务端

1
$ sudo apt-get install tightvncserver

设置VNC连接密码设置以及生成配置文件

首先执行vncserver命令来设置VNC连接密码以及生成VNC配置文件

1
$ vncserver

执行命令后会要求设置连接密码,显示以下内容

1
2
3
4
You will require a password to access your desktops.

Password:
Verify:

设置完密码后,命令会生成VNC配置文件并启动一个VNC实例

1
2
3
4
5
New 'X' desktop is your_hostname:1

Creating default startup script /home/your_username/.vnc/xstartup
Starting applications specified in /home/your_username/.vnc/xstartup
Log file is /home/your_username/.vnc/your_hostname:1.log

配置文件在下面目录里面

1
/home/your_username/.vnc/

第一次运行vncserver命令会自动启动VNC实例,分配到:1上,对应端口为5901 (端口5901=5900+1,如果是:2,则端口为5902,以此类推)。由于要配置VNC,所以先要关闭VNC实例

1
$ vncserver -kill :1

关闭成功后会显示以下信息

1
Killing Xtightvnc process ID 30095

配置VNC

要配置的文件为xstartup,该文件在$HOME/.vnc里面,即

1
/home/your_username/.vnc/

首先备份原始配置文件

1
$ mv ~/.vnc/xstartup ~/.vnc/xstartup.bak

然后创建新的配置文件

1
$ touch ~/.vnc/xstartup

编辑该文件,添加以下内容

1
2
3
4
5
6
#!/bin/sh

xrdb $HOME/.Xresources
unset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS
startxfce4 &

网上很多教程会省略销毁那两个环境变量,即没有

1
2
unset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS

我实际操作时发现如果没有销毁变量,连接上VNC时,画面是全灰的,销毁变量后,显示就正常了。为了保证VNC配置文件能够生效,赋予该文件执行权限

1
$ chmod +x ~/.vnc/xstartup

VNC连接

启动VNC实例,即执行命令

1
$ vncserver

执行成功后,显示

1
2
3
4
New 'X' desktop is your_hostname:1

Starting applications specified in /home/your_username/.vnc/xstartup
Log file is /home/your_username/.vnc/your_hostname:1.log

查看端口开启情况,可以看到5901端口已经开启了

1
$ ss -ltn

此时可以用VNC连接该电脑了,连接地址格式为ip:port

参考

[1] How to Install and Configure VNC on Ubuntu 18.04

[2] VNC server on Ubuntu 18.04 Bionic Beaver Linux

[3] 2018-06-07【Ubuntu 18.04 搭建VNC服务器】