LeetCode 367 - Valid Perfect Square

Posted on 2020-05-10 Edited on 2025-03-02

问题描述

Given a positive integer num, write a function which returns True if num is a perfect square else False.

Note: Do not use any built-in library function such as sqrt.

Example 1:

1 2	Input: 16 Output: true

Example 2:

1 2	Input: 14 Output: false

Related Topics: Math, Binary Search

原问题: 367. Valid Perfect Square

中文翻译版: 367. 有效的完全平方数

解决方案

本题换种说法就是：求正整数 num 的平方根 x，且 x 的平方等于 num，如何求正整数的平方根，可以用二分查找进行求解，这个可以参考LeetCode原题 69. Sqrt(x)（解题报告：LeetCode 69 - Sqrt(x)），二分查找后得到的平方根 x，只需判断 x 的平方是否等于 num 即可判断 num 是否为完全平方数

参考解题代码

#include <iostream>
using namespace std;


class Solution {
public:
    bool isPerfectSquare(int num) {
        long long low, mid, high;

        low = 0;
        high = (long)num + 1;
        while (low < high) {
            mid = low + (high - low) / 2;
            if (mid * mid < num)
                low = mid + 1;
            else
                high = mid;
        }

        // if it doesn't find square of number
        if (low * low != num)
            return false;

        return true;
    }
};


int main()
{
    int num;
    Solution solu;

    // max of int
    // num = 2147483647;
    num = 14;
    cout << num << " is perfect square: "
         << solu.isPerfectSquare(num) << endl;
    return 0;
}

LeetCode 19 - Remove Nth Node From End of List

Posted on 2020-05-02 Edited on 2025-03-02

问题描述

Given a linked list, remove the n-th node from the end of list and return its head.

Example:

1
2
3

Given linked list: 1->2->3->4->5, and n = 2.

After removing the second node from the end, the linked list becomes 1->2->3->5.

Note:

Given n will always be valid.

Follow up:

Could you do this in one pass?

Related Topics: Linked List, Two Pointers

原问题: 19. Remove Nth Node From End of List

中文翻译版: 19. 删除链表的倒数第N个节点

解决方案

这道题可以使用双指针方法解决，设定两个指针 p1 和 p2，先让指针 p1 遍历 n 个节点，然后指针 p2 开始和指针 p1 同步遍历链表，当指针 p1 遍历完链表后，指针 p2 刚好指向倒数第 n 个节点，此时删除该节点即可。

参考解题代码

class Solution {
public:
    ListNode* removeNthFromEnd(ListNode* head, int n) {
        if (head == NULL)
            return NULL;

        ListNode *pos1, *pos2, *prev;
        int step;

        prev = NULL;
        pos1 = pos2 = head;
        step = 0;
        while (pos2 != NULL) {
            pos2 = pos2->next;
            step += 1;
            if (step > n) {
                prev = pos1;
                pos1 = pos1->next;
            }
        }
        if (prev == NULL)
            head = head->next;
        else
            prev->next = pos1->next;

        return head;
    }
};

LeetCode 160 - Intersection of Two Linked Lists

Posted on 2020-05-02 Edited on 2025-03-02

问题描述

Write a program to find the node at which the intersection of two singly linked lists begins.

For example, the following two linked lists:

begin to intersect at node c1.

Example 1:

1
2
3

Input: intersectVal = 8, listA = [4,1,8,4,5], listB = [5,0,1,8,4,5], skipA = 2, skipB = 3
Output: Reference of the node with value = 8
Input Explanation: The intersected node's value is 8 (note that this must not be 0 if the two lists intersect). From the head of A, it reads as [4,1,8,4,5]. From the head of B, it reads as [5,0,1,8,4,5]. There are 2 nodes before the intersected node in A; There are 3 nodes before the intersected node in B.

Example 2:

1
2
3

Input: intersectVal = 2, listA = [0,9,1,2,4], listB = [3,2,4], skipA = 3, skipB = 1
Output: Reference of the node with value = 2
Input Explanation: The intersected node's value is 2 (note that this must not be 0 if the two lists intersect). From the head of A, it reads as [0,9,1,2,4]. From the head of B, it reads as [3,2,4]. There are 3 nodes before the intersected node in A; There are 1 node before the intersected node in B.

Example 3:

Input: intersectVal = 0, listA = [2,6,4], listB = [1,5], skipA = 3, skipB = 2
Output: null
Input Explanation: From the head of A, it reads as [2,6,4]. From the head of B, it reads as [1,5]. Since the two lists do not intersect, intersectVal must be 0, while skipA and skipB can be arbitrary values.
Explanation: The two lists do not intersect, so return null.

Notes:

If the two linked lists have no intersection at all, return null.
The linked lists must retain their original structure after the function returns.
You may assume there are no cycles anywhere in the entire linked structure.
Your code should preferably run in O(n) time and use only O(1) memory.

Related Topics: Linked List

原问题: 160. Intersection of Two Linked Lists

中文翻译版: 160. 相交链表

解决方案

这里假设两条链表有相交节点，如下图所示：

图中 AD 线段代表链表1，线段 CB 加线段 BD 代表链表2，链表1要长于链表2，两条链表相交于节点 B，链表1长度为 |AD| = |AB| + |BD| = p + n，链表2长度为 |CB| + |BD| = m + n（__注明__：这里长度定义为从线段开始节点遍历到结束节点所要移动的节点数）

现在开始同时遍历链表1和链表2，由于链表2比链表1要短，所以链表2最先遍历完，此时链表1遍历到节点 E，因此 |AE| = m + n，继续遍历链表2直到遍历结束，从节点 E 到节点 D 的长度为 ED = q

根据图中表示，我们可以得到一个等式，那就是

1
2
3

    |AB| + |BD| = |AE| + |ED|
==>  p + n = m + n + q
==>  p - m = q

从上面等式可以得到 |AB| - |CB| = q，等式说明了链表1头节点 A 到相交节点 B 的长度比链表2头节点 C 到相交节点 B 长度要长 q，这个 q 是已知量，说明链表1第 q 个节点到节点 B 的距离要等于链表2节点 C 到节点 B 的距离。这里就可以得出该题的一个解题思路：

1
2

设定两个指针p1和p2，分别用于遍历链表1和链表2，指针p1先移动到链表1的第q个节点，
然后指针p2开始遍历链表2，直到 p1 == p2，此时 p1 为两个链表相交节点

参考解题代码

class Solution {
public:
    ListNode *getIntersectionNode(ListNode *headA, ListNode *headB) {
        if (NULL == headA || NULL == headB)
            return NULL;
        if (headA == headB)
            return headA;

        ListNode *prevA, *prevB;
        ListNode *currA, *currB;
        ListNode *posA, *posB;

        prevA = prevB = NULL;
        posA = currA = headA;
        posB = currB = headB;
        while ((currA != NULL) || (currB != NULL)) {
            if (currA != NULL) {
                prevA = currA;
                currA = currA->next;
            } else {
                posB = posB->next;
            }
            if (currB != NULL) {
                prevB = currB;
                currB = currB->next;
            } else {
                posA = posA->next;
            }
        }
        // have intersection
        if (prevA == prevB) {
            while (posA != posB) {
                posA = posA->next;
                posB = posB->next;
            }
            return posA;
        }

        return NULL;
    }
};

LeetCode 141 - Linked List Cycle

Posted on 2020-04-27 Edited on 2025-03-02

问题描述

Given a linked list, determine if it has a cycle in it.

To represent a cycle in the given linked list, we use an integer pos which represents the position (0-indexed) in the linked list where tail connects to. If pos is -1, then there is no cycle in the linked list.

Example 1:

1
2
3

Input: head = [3,2,0,-4], pos = 1
Output: true
Explanation: There is a cycle in the linked list, where tail connects to the second node.

Example 2:

1
2
3

Input: head = [1,2], pos = 0
Output: true
Explanation: There is a cycle in the linked list, where tail connects to the first node.

Example 3:

1
2
3

Input: head = [1], pos = -1
Output: false
Explanation: There is no cycle in the linked list.

Follow up:

Can you solve it using O(1) (i.e. constant) memory?

Related Topics: Linked List, Two Pointers

原问题: 141. Linked List Cycle

中文翻译版: 141. 环形链表

解决方案

该题可以使用双指针方法进行解决，设定快指针 fast 和慢指针 slow，两指针同时从头节点 head 出发，慢指针每前进一个节点，快指针就前进两个节点，如果链表有环，由于两指针前进速度不同，最终两指针会汇聚在同一个节点，即 fast == slow，否则，快指针会最先到达链表节点，两指针不会汇聚在一起。

参考解题代码

#include <iostream>
#include "List.h"
using namespace std;

/**
 * Definition for singly-linked list.
 * struct ListNode {
 *     int val;
 *     ListNode *next;
 *     ListNode(int x) : val(x), next(NULL) {}
 * };
 */
class Solution {
public:
    bool hasCycle(ListNode *head) {
        if (head == NULL)
            return false;

        ListNode *fast, *slow;
        fast = slow = head;
        while (fast != NULL && fast->next != NULL) {
            slow = slow->next;
            fast = fast->next->next;
            if (slow == fast)
                return true;
        }

        return false;
    }
};

int main()
{
    ListNode *node1 = create_list_node(1);
    ListNode *node2 = create_list_node(2);
    ListNode *node3 = create_list_node(3);
    ListNode *node4 = create_list_node(4);
    connect_list_nodes(node1, node2);
    connect_list_nodes(node2, node3);
    connect_list_nodes(node3, node4);
    connect_list_nodes(node4, node2);

    Solution solu;
    cout << solu.hasCycle(node1) << endl;

    return 0;
}

LeetCode 36 - Valid Sudoku

Posted on 2020-04-24 Edited on 2025-03-02

问题描述

Determine if a 9x9 Sudoku board is valid. Only the filled cells need to be validated according to the following rules:

Each row must contain the digits 1-9 without repetition.
Each column must contain the digits 1-9 without repetition.
Each of the 9 3x3 sub-boxes of the grid must contain the digits 1-9 without repetition.

A partially filled sudoku which is valid.

The Sudoku board could be partially filled, where empty cells are filled with the character '.'.

Example 1:

Input:
[
  ["5","3",".",".","7",".",".",".","."],
  ["6",".",".","1","9","5",".",".","."],
  [".","9","8",".",".",".",".","6","."],
  ["8",".",".",".","6",".",".",".","3"],
  ["4",".",".","8",".","3",".",".","1"],
  ["7",".",".",".","2",".",".",".","6"],
  [".","6",".",".",".",".","2","8","."],
  [".",".",".","4","1","9",".",".","5"],
  [".",".",".",".","8",".",".","7","9"]
]
Output: true

Example 2:

Input:
[
  ["8","3",".",".","7",".",".",".","."],
  ["6",".",".","1","9","5",".",".","."],
  [".","9","8",".",".",".",".","6","."],
  ["8",".",".",".","6",".",".",".","3"],
  ["4",".",".","8",".","3",".",".","1"],
  ["7",".",".",".","2",".",".",".","6"],
  [".","6",".",".",".",".","2","8","."],
  [".",".",".","4","1","9",".",".","5"],
  [".",".",".",".","8",".",".","7","9"]
]
Output: false
Explanation: Same as Example 1, except with the 5 in the top left corner being 
    modified to 8. Since there are two 8's in the top left 3x3 sub-box, it is invalid.

Note:

A Sudoku board (partially filled) could be valid but is not necessarily solvable.
Only the filled cells need to be validated according to the mentioned rules.
The given board contain only digits 1-9 and the character '.'.
The given board size is always 9x9.

Related Topics: Hash Table

原问题: 36. Valid Sudoku

中文翻译版: 36. 有效的数独

解决方案

方案1

根据题目说明，一个有效的数独，满足三个条件：

每行数字有重复数字
每列不能有重复数字
每个 3x3 块中不能有重复数字

怎么判断一行、一列或者一个小块中是否有重复数字，此时我们可以给用哈希表进行快速查找判断。首先我们分别给每一行、每一列以及每一小块建立一个哈希表，然后我们遍历所有数字，当遍历到某个数字时，我们根据该数字所处的行、列以及小块找到对应的哈希表，查找该数字是否在哈希表中出现，如果出现，说明该数独是无效的，否则我们将该数字存入哈希表，继续遍历。

参考解题代码1

#include <vector>
#include <unordered_set>
#include <iostream>
using namespace std;


class Solution {
public:
    bool isValidSudoku(vector<vector<char>>& board) {
        vector<unordered_set<char>> row_sets(9);
        vector<unordered_set<char>> column_sets(9);
        vector<unordered_set<char>> block_sets(9);

        char ch;
        int block_id;
        for (auto i=0; i<9; i++) {
            for (auto j=0; j<9; j++) {
                ch = board[i][j];

                if (ch == '.')
                    continue;

                if (row_sets[i].find(ch) == row_sets[i].end())
                    row_sets[i].insert(ch);
                else
                    return false;

                if (column_sets[j].find(ch) == column_sets[j].end())
                    column_sets[j].insert(ch);
                else
                    return false;

                block_id = int(i / 3.0) * 3 + int(j / 3.0);
                if (block_sets[block_id].find(ch) == block_sets[block_id].end())
                    block_sets[block_id].insert(ch);
                else
                    return false;
            }
        }

        return true;
    }
};

int main()
{
    vector<vector<char>> board = {
            {'5', '3', '.', '.', '7', '.', '.', '.', '.'},
            {'6', '.', '.', '1', '9', '5', '.', '.', '.'}, 
            {'.', '9', '8', '.', '.', '.', '.', '6', '.'}, 
            {'8', '.', '.', '.', '6', '.', '.', '.', '3'}, 
            {'4', '.', '.', '8', '.', '3', '.', '.', '1'}, 
            {'7', '.', '.', '.', '2', '.', '.', '.', '6'}, 
            {'.', '6', '.', '.', '.', '.', '2', '8', '.'}, 
            {'.', '.', '.', '4', '1', '9', '.', '.', '5'}, 
            {'.', '.', '.', '.', '8', '.', '.', '7', '9'}
    };

    for (auto i=0; i<board.size(); i++) {
        for (auto j=0; j<board[i].size(); j++) {
            cout << board[i][j] << " ";
        }
        cout << endl;
    }

    Solution solu;
    cout << "Is valid: " << solu.isValidSudoku(board) << endl;

    return 0;
}

方案2

同方案1的思想，只不过此时一行、一列以及一小块对应的哈希表分别用一个整数进行替代，通过该整数的某一位是否为1来进行重复数字判断，主要使用的是位与运算 & 和位或运算 |。当遍历到某个数字 x 时，该数字所在行对应的整数为 y，此时判断该数字是否重复可以进行如下操作：

1	y & (1 << x)

如果该表达式值非0，说明 y 的第 x 位是1，这说明该数字之前出现过，否则该表达式值为0。如果该数字未重复出现，则 y 设为：

1	y = y \| (1 << x)

参考解题代码2

#include <iostream>
#include <cmath>
#include <vector>
using namespace std;


class Solution {
public:
    bool isValidSudoku(vector<vector<char>>& board) {
        int row_status[board.size()];
        int col_status[board.size()];
        int cell_status[board.size()];
        int digit, cell, block_size, num_blocks;

        for (int i=0; i<board.size(); i++) {
            row_status[i] = 0;
            col_status[i] = 0;
            cell_status[i] = 0;
        }

        block_size = int(sqrt(board.size()));
        num_blocks = board.size() / block_size;
        for (int i=0; i<board.size(); i++) {
            for (int j=0; j<board[i].size(); j++) {
                if (board[i][j] == '.')
                    continue;

                digit = 1 << (board[i][j] - '0');
                cell = (i / block_size) * num_blocks + (j / block_size);
                if ((row_status[i] & digit) != 0)
                    return false;
                if ((col_status[j] & digit) != 0)
                    return false;
                if ((cell_status[cell] & digit) != 0)
                    return false;
                row_status[i] |= digit;
                col_status[j] |= digit;
                cell_status[cell] |= digit;
            }
        }

        return true;
    }
};

int main()
{
    vector<vector<char>> board = {
            {'5', '3', '.', '.', '7', '.', '.', '.', '.'},
            {'6', '.', '.', '1', '9', '5', '.', '.', '.'},
            {'.', '9', '8', '.', '.', '.', '.', '6', '.'},
            {'8', '.', '.', '.', '6', '.', '.', '.', '3'},
            {'4', '.', '.', '8', '.', '3', '.', '.', '1'},
            {'7', '.', '.', '.', '2', '.', '.', '.', '6'},
            {'.', '6', '.', '.', '.', '.', '2', '8', '.'},
            {'.', '.', '.', '4', '1', '9', '.', '.', '5'},
            {'.', '.', '.', '.', '8', '.', '.', '7', '9'}
    };

    for (auto i=0; i<board.size(); i++) {
        for (auto j=0; j<board[i].size(); j++) {
            cout << board[i][j] << " ";
        }
        cout << endl;
    }

    Solution solu;
    cout << "Is valid: " << solu.isValidSudoku(board) << endl;

    return 0;
}

LeetCode 69 - Sqrt(x)

Posted on 2020-04-23 Edited on 2025-03-02

问题描述

Implement int sqrt(int x).

Compute and return the square root of $x$, where $x$ is guaranteed to be a non-negative integer.

Since the return type is an integer, the decimal digits are truncated and only the integer part of the result is returned.

Example 1:

1 2	Input: 4 Output: 2

Example 2:

Input: 8
Output: 2
Explanation: The square root of 8 is 2.82842..., and since 
             the decimal part is truncated, 2 is returned.

Related Topics: Math, Binary Search

原问题: 69. Sqrt(x)

中文翻译版: 69. x 的平方根

解决方案

首先 $x$ 的平方根取值范围为 $[0, x]$，该区间中可能是 $x$ 的平方根的数要求满足其平方小于等于 $x$。题目中要求输出的是整数，此时问题变为在区间 $[0, x]$ 中寻找满足 $s^2 <= x$ 条件的最大的整数 $s$。

很自然地，我们可以从 $0$ 开始遍历，直到某个值的平方大于 $x$，此时前一个值就是所求的平方根，但是这种解法一般会超时。既然是在一个离散的区间内进行查找，并且区间的元素是有序的，此时我们可以用二分查找 (Binary Search) 快速找到我们想要的值。

参考解题代码

/*
 * Use binary search to find the square root of x
 */

#include <iostream>
using namespace std;


class Solution {
public:
    int mySqrt(int x) {
        long long low, high, mid, square;

        // search in [low, high), so high = x + 1
        // use casting to avoid numerical overflow
        low = 0, high = (long long)x + 1;
        while (low < high) {
            mid = low + (high - low) / 2;
            square = mid * mid;
            if (square <= x)
                low = mid + 1;
            else
                high = mid;
        }
        return low - 1;
    }
};

int main()
{
    int x = 2147483647;
    Solution solu;

    cout << "Sqrt(" << x << ") = " << solu.mySqrt(x) << endl;
    return 0;
}

这里参考了知乎一个关于二分查找问题的回答二分查找有几种写法？它们的区别是什么？，因为我们要找的是 $s^2 <= x$ (等价于 $s <= \sqrt{x}$) 的上界，所以参考了 upper_bound(value) - 1 的写法。

LeetCode 50 - Pow(x, n)

Posted on 2020-04-22 Edited on 2025-03-02

问题描述

Implement pow(x, n), which calculates $x$ raised to the power $n$ ($x^n$).

Example 1:

1 2	Input: 2.00000, 10 Output: 1024.00000

Example 2:

1 2	Input: 2.10000, 3 Output: 9.26100

Example 3:

1
2
3

Input: 2.00000, -2
Output: 0.25000
Explanation: 2^(-2) = (1/2)^2 = 1/4 = 0.25

Note:

$-100.0 \lt x \lt 100.0$
$n$ is a 32-bit signed integer, within the range $[-2^{31}, 2^{31}-1]$

Related Topics: Math, Binary Search

原问题: 50. Pow(x, n)

中文翻译版: 50. Pow(x, n)

解决方案

方案1

题目中 $n$ 是整数，此时求 $x$ 的 $n$ 次方可以分解为两个 $x$ 的 $n/2$ 次方相乘，即：

$$
x^n = \begin{cases}
x^{n/2} \cdot x^{n/2} & n \text{ is even} \
x^{n/2} \cdot x^{n/2} \cdot x & n \text{ is odd}
\end{cases}
$$

则此题可以用递归进行求解，需要注意的是如果 $n$ 是负数，不能在代码里将 $n$ 转为正数，$x$ 转为 $1/x$，因为该题的测试用例中会有 $n = -2^{31}$ 这种取值，如果取正会导致数值溢出，解决办法是当 $n$ 为奇数时，此时

$$
x^{n} = x^{n/2} \cdot x^{n/2} \cdot \frac{1}{x}
$$

参考解题代码1

#include <iostream>
using namespace std;

class Solution {
public:
    double myPow(double x, int n) {
        // do not transfer n to -n if n < 0
        // because of numerical overflow (n = -2^31)
        if (n == 0)
            return 1.0;
        double half = myPow(x, n/2);
        if (n % 2 == 0) {
            return half * half;
        } else {
            if (n < 0)
                x = 1 / x;
            return half * half * x;
        }
    }
};

int main()
{
    double x;
    int n;
    Solution solu;

    x = 1.00000;
    n = -2147483648;    // n = -2^31
    cout << "Pow(" << x << ", " << n << ") = "
         << solu.myPow(x, n) << endl;
    return 0;
}

方案2

方案1是递归解法，这里介绍非递归解法。从二进制角度看整数 $n$，如果第 $k$ 位为1，说明 $x^n$ 可以表示为 $x^n = x^{2^k} \cdot x^{n-2^k}$，以此类推，将余下的 $x^{n-2^k}$ 根据非零位进行分解。例如 $n=5$，其二进制表示为 101，则根据非零位，我们可以得到以下分解:

$$
x^5 = x^{2^2} \cdot x^{2^0}
$$

根据分解可以得到一个迭代解法，就是计算结果初始值为 ans = 1，从第0位依次往高位对 $n$ 的二进制进行非零判断，如果 $n$ 的二进制第 $k$ 位非0，则 ans 乘上 $x^{2^k}$，即

$$
\text{ans} = \text{ans} \cdot x^{2^k}
$$

遍历完 $n$ 的所有二进制位，ans 就是我们求得的计算结果。

那么我们怎么快速得到 $x^{2^k}$ 呢？我们可以将 $n$ 不断进行右移操作，每移动1位，对 $x$ 就进行以下计算

x *= x

这样当我们右移 $k$ 次时，此时 $x$ 已经是原始 $x$ 的 $2^k$ 次方，此时我们用位与运算判断第1位是否非0，如果非0，按照前面迭代过程可得，最终计算结果 ans 需要乘上 $x$。

参考解题代码2

#include <iostream>
using namespace std;


class Solution {
public:
    double myPow(double x, int n) {
        if (n == 0)
            return 1.0;

        long long num = n;
        double ans;

        if (n < 0) {
            // use long long type to avoid numerical overflow
            num = -(long long)n;
            x = 1 / x;
        }

        ans = 1.0;
        while (num > 0) {
            if ((num & 1) != 0)
                ans *= x;
            x *= x;
            num >>= 1;
        }

        return ans;
    }
};

int main()
{
    double x;
    int n;
    Solution solu;

    x = 1.00000;
    n = -2147483648;    // n = -2^31
    cout << "Pow(" << x << ", " << n << ") = "
         << solu.myPow(x, n) << endl;
    return 0;
}

TeX Live 2019 Installation

Posted on 2020-01-28 Edited on 2025-03-02

1. 安装环境

OS Version: OS X 10.11.6

2. 获取安装脚本

这里没有选择 MacTex 安装方式，而是采用 Unix Install Script 进行在线安装。之所以不用 MaxTex 是由于电脑系统版本有点老，MacTeX-2019 需要 Mac OS 10.12 以上版本的系统。

下载 install-tl-unx.tar.gz 安装脚本，解压后，得到以下文件。

安装脚本为 install-tl

3. 开始安装

启动安装只需要执行 install-tl 即可。这里直接在控制台执行安装脚本。

1	./install-tl-20200127/install-tl

4. 安装配置

4.1. 配置镜像地址

安装第一步是配置镜像地址，具体镜像的选择根据所处网络环境进行决定，这里选择了清华大学的镜像地址。

选择完镜像后，安装界面会显示切换后的镜像地址。

4.2. 详细配置

接下来是安装配置，简易设置仅仅设置安装路径以及默认纸张。如果要非root安装Tex Live，安装路径设为用户目录即可，这里设置安装路径为：

1	/Users/luowanqian/local/texlive/2019

如果要更加详细的配置，点 Advanced 按钮即可。

配置完后点 安装 启动安装。

4.3. 等待安装完毕

启动安装后，安装程序会自动下载包进行安装。这部分时间消耗较长，耐心等待即可。

安装完毕会显示如下图。

__注意__：安装完毕后，界面（见上图）会提示相关环境变量的设置，内容大概如下。

欢迎进入 TeX Live 的世界！

See /Users/luowanqian/local/texlive/2019/index.html for links to documentation.
The TeX Live web site (https://tug.org/texlive/) contains any updates and corrections. TeX Live is a joint project of the TeX user groups around the world; please consider supporting it by joining the group best for you. The list of groups is available on the web at https://tug.org/usergroups.html.

Add /Users/luowanqian/local/texlive/2019/texmf-dist/doc/man to MANPATH.
Add /Users/luowanqian/local/texlive/2019/texmf-dist/doc/info to INFOPATH.
Most importantly, add /Users/luowanqian/local/texlive/2019/bin/x86_64-darwinlegacy
to your PATH for current and future sessions.

5. 环境变量设置

安装最后一步是设置环境变量，由前面提示可知，要设置的环境变量有三个：MANPATH、INFOPATH 以及 PATH。

根据提示在 .bashrc 文件中设置环境变量即可完成安装。

1
2
3

export MANPATH="$MANPATH:/Users/luowanqian/local/texlive/2019/texmf-dist/doc/man"
export INFOPATH="$INFOPATH:/Users/luowanqian/local/texlive/2019/texmf-dist/doc/info"
export PATH="$PATH:/Users/luowanqian/local/texlive/2019/bin/x86_64-darwinlegacy"

Reference

1: TeX Live - Quick install

2: Unix Install of TeXLive 2019

101 NumPy Exercises

Posted on 2020-01-28 Edited on 2025-03-02

1. 将 NumPy 导入为 np，并查看版本

English Version

Title: Import numpy as np and see the version

Difficulty Level: L1

Question: Import numpy as np and print the version number.

难度：L1

问题：将 NumPy 导入为 np，并输出版本号。

Solution

1
2
3

>>> import numpy as np
>>> print(np.__version__)
1.15.4

2. 如何创建 1 维数组？

English Version

Title: How to create a 1D array?

Difficulty Level: L1

Question: Create a 1D array of numbers from 0 to 9.

难度：L1

问题：创建数字从 0 到 9 的 1 维数组。

期望输出：

1	array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Solution

1
2
3

>>> arr = np.arange(10)
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

3. 如何创建 boolean 数组？

English Version

Title: How to create a boolean array?

Difficulty Level: L1

Question: Create a 3×3 numpy array of all True’s.

难度：L1

问题：创建所有值为 True 的 3×3 NumPy 数组。

Solution 1

>>> np.full((3, 3), True)
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

Solution 2

>>> np.ones((3, 3), dtype=bool)
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

4. 如何从 1 维数组中提取满足给定条件的项？

English Version

Title: How to extract items that satisfy a given condition from 1D array?

Difficulty Level: L1

Question: Extract all odd numbers from arr.

难度：L1

问题：从 arr 中提取所有奇数。

输入：

1	>>> arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出：

1	array([1, 3, 5, 7, 9])

Solution

1 2	>>> arr[arr % 2 == 1] array([1, 3, 5, 7, 9])

5. 如何将 NumPy 数组中满足给定条件的项替换成另一个数值？

English Version

Title: How to replace items that satisfy a condition with another value in numpy array?

Difficulty Level: L1

Question: Replace all odd numbers in arr with -1.

难度：L1

问题：将 arr 中的所有奇数替换成 -1。

输入：

1	>>> arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出：

1	array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1])

Solution

1
2
3

>>> arr[arr % 2 == 1] = -1
>>> arr
array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

6. 如何在不影响原始数组的前提下替换满足给定条件的项？

English Version

Title: How to replace items that satisfy a condition without affecting the original array?

Difficulty Level: L2

Question: Replace all odd numbers in arr with -1 without changing arr.

难度：L2

问题：将 arr 中所有奇数替换成 -1，且不改变 arr。

输入：

1	>>> arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出：

>>> out
array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1])
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Solution 1

>>> out = np.copy(arr)
>>> out[out % 2 == 1] = -1
>>> out
array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
out

Solution 2

>>> out = np.where(arr % 2 == 1, -1, arr)
>>> out
array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

7. 如何重塑（reshape）数组？

English Version

Title: How to reshape an array?

Difficulty Level: L1

Question: Convert a 1D array to a 2D array with 2 rows.

难度：L1

问题：将 1 维数组转换成 2 维数组（两行）。

输入：

1
2
3

>>> arr = np.arange(10)
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出：

1 2	array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])

Solution

1
2
3

>>> arr.reshape((2, -1))
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

8. 如何垂直堆叠两个数组？

English Version

Title: How to stack two arrays vertically?

Difficulty Level: L2

Question: Stack arrays a and b vertically.

难度：L2

问题：垂直堆叠数组 a 和 b。

输入：

>>> a = np.arange(10).reshape(2, -1)
>>> b = np.repeat(1, 10).reshape(2, -1)
>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
>>> b
array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

期望输出：

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

Solution 1

>>> np.concatenate((a, b), axis=0)
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

Solution 2

>>> np.vstack((a, b))
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

Solution 3

>>> np.r_[a, b]
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

9. 如何水平堆叠两个数组？

English Version

Title: How to stack two arrays horizontally?

Difficulty Level: L2

Question: Stack the arrays a and b horizontally.

难度：L2

问题：水平堆叠数组 a 和 b。

输入：

>>> a = np.arange(10).reshape(2, -1)
>>> b = np.repeat(1, 10).reshape(2, -1)
>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
>>> b
array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

期望输出：

1 2	array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1], [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])

Solution 1

1
2
3

>>> np.concatenate((a, b), axis=1)
array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
       [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])

Solution 2

1
2
3

>>> np.hstack((a, b))
array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
       [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])

Solution 3

1
2
3

>>> np.c_[a, b]
array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
       [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])

10. 在不使用硬编码的前提下，如何在 NumPy 中生成自定义序列？

English Version

Title: How to generate custom sequences in numpy without hardcoding?

Difficulty Level: L2

Question: Create the following pattern without hardcoding. Use only numpy functions and the below input array a.

难度：L2

问题：在不使用硬编码的前提下创建以下模式。仅使用 NumPy 函数和以下输入数组 a。

输入

1	>>> a = np.array([1, 2, 3])

期望输出：

1	array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

Solution 1

1 2	>>> np.concatenate((np.repeat(a, 3), np.tile(a, 3))) array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

Solution 2

1 2	>>> np.r_[np.repeat(a, 3), np.tile(a, 3)] array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

11. 如何获得两个 Python NumPy 数组中共同的项？

English Version

Title: How to get the common items between two python numpy arrays?

Difficulty Level: L2

Question: Get the common items between a and b.

难度：L2

问题：获取数组 a 和 b 中的共同项。

输入：

1 2	>>> a = np.array([1, 2, 3, 2, 3, 4, 3, 4, 5, 6]) >>> b = np.array([7, 2, 10, 2, 7, 4, 9, 4, 9, 8])

期望输出：

1	array([2, 4])

Solution

1 2	>>> np.intersect1d(a, b) array([2, 4])

12. 如何从一个数组中移除与另一个数组重复的项？

English Version

Title: How to remove from one array those items that exist in another?

Difficulty Level: L2

Question: From array a remove all items present in array b.

难度：L2

问题：从数组 a 中移除出现在数组 b 中的所有项。

输入：

1 2	>>> a = np.array([1, 2, 3, 4, 5]) >>> b = np.array([5, 6, 7, 8, 9])

期望输出：

1	array([1, 2, 3, 4])

Solution

1 2	>>> np.setdiff1d(a, b) array([1, 2, 3, 4])

13. 如何获取两个数组匹配元素的位置？

English Version

Title: How to get the positions where elements of two arrays match?

Difficulty Level: L2

Question: Get the positions where elements of a and b match.

难度：L2

问题：获取数组 a 和 b 中匹配元素的位置。

输入：

1 2	>>> a = np.array([1, 2, 3, 2, 3, 4, 3, 4, 5, 6]) >>> b = np.array([7, 2, 10, 2, 7, 4, 9, 4, 9, 8])

期望输出：

1	(array([1, 3, 5, 7]), )

Solution

1 2	>>> np.where(a == b) (array([1, 3, 5, 7]),)

14. 如何从 NumPy 数组中提取给定范围内的所有数字？

English Version

Title: How to extract all numbers between a given range from a numpy array?

Difficulty Level: L2

Question: Get all items between 5 and 10 from a.

难度：L2

问题：从数组 a 中提取 5 和 10 之间的所有项。

输入：

1	>>> a = np.array([2, 6, 1, 9, 10, 3, 27])

期望输出：

1	array([6, 9, 10])

Solution 1

1 2	>>> a[(a >= 5) & (a <= 10)] array([ 6, 9, 10])

Solution 2

1
2
3

>>> index = np.where((a >= 5) & (a <= 10))
>>> a[index]
array([ 6,  9, 10])

Solution 3

1
2
3

>>> index = np.where(np.logical_and(a>=5, a<=10))
>>> a[index]
array([ 6,  9, 10])

15. 如何创建一个 Python 函数以对 NumPy 数组执行元素级的操作？

English Version

Title: How to make a python function that handles scalars to work on numpy arrays?

Difficulty Level: L2

Question: Convert the function maxx that works on two scalars, to work on two arrays.

难度：L2

问题：转换函数 maxx，使其从只能对比标量而变为对比两个数组。

输入：

>>> def maxx(x, y):
...     """Get the maximum of two items"""
...     if x >= y:
...        return x
...     else:
...        return y
...
>>> maxx(1, 5)
5

期望输出：

>>> a = np.array([5, 7, 9, 8, 6, 4, 5])
>>> b = np.array([6, 3, 4, 8, 9, 7, 1])
>>> pair_max(a, b)
array([6., 7., 9., 8., 9., 7., 5.])

Solution

>>> pair_max = np.vectorize(maxx, otypes=[float])
>>> a = np.array([5, 7, 9, 8, 6, 4, 5])
>>> b = np.array([6, 3, 4, 8, 9, 7, 1])
>>> pair_max(a, b)
array([6., 7., 9., 8., 9., 7., 5.])

16. 如何在 2d NumPy 数组中交换两个列？

English Version

Title: How to swap two columns in a 2d numpy array?

Difficulty Level: L2

Question: Swap columns 1 and 2 in the array arr.

难度：L2

问题：在数组 arr 中交换列 1 和列 2。

>>> arr = np.arange(9).reshape(3, 3)
>>> arr
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Solution 1

>>> arr[:, [1, 0, 2]]
array([[1, 0, 2],
       [4, 3, 5],
       [7, 6, 8]])

Solution 2

# Swap in-place
>>> tmp = arr[:, 0].copy()
>>> arr[:, 0] = arr[:, 1]
>>> arr[:, 1] = tmp
>>> arr
array([[1, 0, 2],
       [4, 3, 5],
       [7, 6, 8]])

17. 如何在 2d NumPy 数组中交换两个行？

English Version

Title: How to swap two rows in a 2d numpy array?

Difficulty Level: L2

Question: Swap rows 1 and 2 in the array arr.

难度：L2

问题：在数组 arr 中交换行 1 和行 2。

>>> arr = np.arange(9).reshape(3, 3)
>>> arr
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Solution 1

>>> arr[[1, 0, 2], :]
array([[3, 4, 5],
       [0, 1, 2],
       [6, 7, 8]])

Solution 2

# Swap in-place
>>> tmp = arr[0, :].copy()
>>> arr[0, :] = arr[1, :]
>>> arr[1, :] = tmp
>>> arr
array([[3, 4, 5],
       [0, 1, 2],
       [6, 7, 8]])

18. 如何反转 2D 数组的所有行？

English Version

Title: How to reverse the rows of a 2D array?

Difficulty Level: L2

Question: Reverse the rows of a 2D array arr.

难度：L2

问题：反转 2D 数组 arr 中的所有行。

>>> arr = np.arange(9).reshape(3, 3)
>>> arr
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Solution

>>> arr[::-1]
array([[6, 7, 8],
       [3, 4, 5],
       [0, 1, 2]])

19. 如何反转 2D 数组的所有列？

English Version

Title: How to reverse the columns of a 2D array?

Difficulty Level: L2

Question: Reverse the columns of a 2D array arr.

难度：L2

问题：反转 2D 数组 arr 中的所有列。

>>> arr = np.arange(9).reshape(3, 3)
>>> arr
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Solution

>>> arr[:, ::-1]
array([[2, 1, 0],
       [5, 4, 3],
       [8, 7, 6]])

20. 如何创建一个包含 5 和 10 之间浮点数的随机 2 维数组？

English Version

Title: How to create a 2D array containing random floats between 5 and 10?

Difficulty Level: L2

Question: Create a 2D array of shape 5x3 to contain random decimal numbers between 5 and 10.

难度：L2

问题：创建一个形态为 5×3 的 2 维数组，包含 5 和 10 之间的随机十进制小数。

Solution 1

>>> np.random.seed(100)
>>> np.random.uniform(5, 10, size=(5, 3))
array([[7.71702471, 6.39184693, 7.12258795],
       [9.22388066, 5.02359428, 5.6078456 ],
       [8.35374542, 9.12926378, 5.68353295],
       [7.87546665, 9.45660977, 6.04601061],
       [5.9266411 , 5.54188445, 6.09848746]])

Solution 2

>>> np.random.seed(100)
>>> arr = (10 - 5) * np.random.rand(5, 3) + 5
>>> arr
array([[7.71702471, 6.39184693, 7.12258795],
       [9.22388066, 5.02359428, 5.6078456 ],
       [8.35374542, 9.12926378, 5.68353295],
       [7.87546665, 9.45660977, 6.04601061],
       [5.9266411 , 5.54188445, 6.09848746]])

Solution 3

# Maybe different from other solutions
>>> rand_arr = np.random.randint(low=5, high=10, size=(5, 3)) + np.random.random((5, 3))
>>> rand_arr
array([[6.41920093, 9.40003816, 7.78940871],
       [7.973373  , 6.51303275, 6.04690216],
       [5.26486281, 8.24187676, 9.69046437],
       [8.34740798, 7.26776599, 8.26254059],
       [8.46680771, 9.86023614, 6.52209887]])

21. 如何在 Python NumPy 数组中仅输出小数点后三位的数字？

English Version

Title: How to print only 3 decimal places in python numpy array?

Difficulty Level: L1

Question: Print or show only 3 decimal places of the numpy array rand_arr.

难度：L1

问题：输出或显示 NumPy 数组 rand_arr 中小数点后三位的数字。

输入：

1	rand_arr = np.random.random((5, 3))

Solution

>>> np.set_printoptions(precision=3)
>>> rand_arr
array([[0.152, 0.272, 0.846],
       [0.927, 0.521, 0.665],
       [0.465, 0.67 , 0.136],
       [0.829, 0.175, 0.343],
       [0.281, 0.177, 0.596]])

22. 如何通过禁用科学计数法（如 1e10）打印 NumPy 数组？

English Version

Title: How to pretty print a numpy array by suppressing the scientific notation (like 1e10)?

Difficulty Level: L1

Question: Pretty print rand_arr by suppressing the scientific notation (like 1e10).

难度：L1

问题：通过禁用科学计数法（如 1e10）打印 NumPy 数组 rand_arr。

输入：

# Create the random array
>>> np.random.seed(100)
>>> rand_arr = np.random.random([3, 3]) / 1e3
>>> rand_arr
array([[5.43404942e-04, 2.78369385e-04, 4.24517591e-04],
       [8.44776132e-04, 4.71885619e-06, 1.21569121e-04],
       [6.70749085e-04, 8.25852755e-04, 1.36706590e-04]])

期望输出：

1
2
3

array([[0.000543, 0.000278, 0.000425],
       [0.000845, 0.000005, 0.000122],
       [0.000671, 0.000826, 0.000137]])

Solution

# precision is optional
>>> np.set_printoptions(suppress=True, precision=6)
>>> rand_arr
array([[0.000543, 0.000278, 0.000425],
       [0.000845, 0.000005, 0.000122],
       [0.000671, 0.000826, 0.000137]])

23. 如何限制 NumPy 数组输出中项的数目？

English Version

Title: How to limit the number of items printed in output of numpy array?

Difficulty Level: L1

Question: Limit the number of items printed in python numpy array a to a maximum of 6 elements.

难度：L1

问题：将 Python NumPy 数组 a 输出的项的数目限制在最多 6 个元素。

输入：

1
2
3

>>> a = np.arange(15)
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

期望输出：

1	array([ 0, 1, 2, ..., 12, 13, 14])

Solution

1
2
3

>>> np.set_printoptions(threshold=6)
>>> a
array([ 0,  1,  2, ..., 12, 13, 14])

24. 如何在不截断数组的前提下打印出完整的 NumPy 数组？

English Version

Title: How to print the full numpy array without truncating

Difficulty Level: L1

Question: Print the full numpy array a without truncating.

难度：L1

问题：在不截断数组的前提下打印出完整的 NumPy 数组 a。

输入：

>>> np.set_printoptions(threshold=6)
>>> a = np.arange(15)
>>> a
array([ 0, 1, 2, ..., 12, 13, 14])

期望输出：

1 2	>>> a array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

Solution 1

1
2
3

>>> np.set_printoptions(threshold=np.nan)
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

Solution 2

1
2
3

>>> np.set_printoptions(threshold=1000)
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

25. 如何向 Python NumPy 导入包含数字和文本的数据集，同时保持文本不变？

English Version

Title: How to import a dataset with numbers and texts keeping the text intact in python numpy?

Difficulty Level: L2

Question: Import the iris dataset keeping the text intact.

难度：L2

问题：导入 iris 数据集，保持文本不变。

从 Iris Data Set 网页下载数据集 iris.data。

Solution

>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype=object)
>>> iris[:3]
array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa']], dtype=object)

Since we want to retain the species, a text field, I have set the dtype to object. Had I set dtype=None, a 1d array of tuples would have been returned.

26. 如何从 1 维元组数组中提取特定的列？

English Version

Title: How to extract a particular column from 1D array of tuples?

Difficulty Level: L2

Question: Extract the text column species from the 1D iris_1d.

难度：L2

问题：从导入的 1 维 iris_1d 中提取文本列 species。

输入：

1 2	>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" >>> iris_1d = np.genfromtxt(url, delimiter=",", dtype=None)

Solution 1

>>> species = np.array([row[4] for row in iris_1d])
>>> species[:7]
array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
       b'Iris-setosa', b'Iris-setosa', b'Iris-setosa'], dtype='|S18')

Solution 2

>>> vfunc = np.vectorize(lambda x: x[4])
>>> species = vfunc(iris_1d)
>>> species[:7]
array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
       b'Iris-setosa', b'Iris-setosa', b'Iris-setosa'], dtype='|S15')

27. 如何将 1 维元组数组转换成 2 维 NumPy 数组？

English Version

Title: How to convert a 1d array of tuples to a 2d numpy array?

Difficulty Level: L2

Question: Convert the 1D iris_1d to 2D array iris_2d by omitting the species text field.

难度：L2

问题：忽略 species 文本字段，将 1 维 iris_1d 转换成 2 维数组 iris_2d。

输入：

1 2	>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" >>> iris_1d = np.genfromtxt(url, delimiter=",", dtype=None)

Solution

>>> iris_2d = np.array([row.tolist()[:4] for row in iris_1d])
>>> iris_2d[:3]
array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2]])

28. 如何计算 NumPy 数组的平均值、中位数和标准差？

English Version

Title: How to compute the mean, median, standard deviation of a numpy array?

Difficulty: L1

Question: Find the mean, median, standard deviation of iris’s sepal length (1st column).

难度：L1

问题：找出 iris sepal length（第一列）的平均值、中位数和标准差。

1 2	>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" >>> iris_1d = np.genfromtxt(url, delimiter=",", dtype=None)

Solution

>>> sepal_length = np.array([row[0] for row in iris_1d])
>>> mean, median, std = np.mean(sepal_length), np.median(sepal_length), np.std(sepal_length)
>>> mean, median, std
(5.843333333333334, 5.8, 0.8253012917851409)

29. 如何归一化数组，使值的范围在 0 和 1 之间？

English Version

Title: How to normalize an array so the values range exactly between 0 and 1?

Difficulty: L2

Question: Create a normalized form of iris’s sepal length whose values range exactly between 0 and 1 so that the minimum has value 0 and maximum has value 1.

难度：L2

问题：创建 iris sepal length 的归一化格式，使其值在 0 到 1 之间。

输入：

1 2	url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" sepal_length = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0])

Solution

>>> max_value = np.max(sepal_length)
>>> min_value = np.min(sepal_length)
>>> sepal_length_nm = (sepal_length - min_value) / (max_value - min_value)
>>> sepal_length_nm[:3]
array([0.22222222, 0.16666667, 0.11111111])

30. 如何计算 softmax 分数？

English Version

Title: How to compute the softmax score?

Difficulty Level: L3

Question: Compute the softmax score of sepal length.

难度：L3

问题：计算 sepal length 的 softmax 分数。

1 2	url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" sepal_length = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0])

Solution

According formula:

$$
S(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}
$$

>>> sepal_length_exp = np.exp(sepal_length)
>>> exp_sum = np.sum(sepal_length_exp)
>>> sepal_length_sm = sepal_length_exp / exp_sum
>>> sepal_length_sm[:5]
array([0.00221959, 0.00181724, 0.00148783, 0.00134625, 0.00200836])

For numerical stability, the formula changes to:

$$
S(x_i) = \frac{e^{(x_i - x_{max})}}{\sum_j e^{(x_j - x_{max})}}
$$

where $x_{max} = max(x)$.

>>> sepal_length_exp = np.exp(sepal_length - np.max(sepal_length))
>>> exp_sum = np.sum(sepal_length_exp)
>>> sepal_length_sm = sepal_length_exp / exp_sum
>>> sepal_length_sm[:5]
array([0.00221959, 0.00181724, 0.00148783, 0.00134625, 0.00200836])

31. 如何找到 NumPy 数组的百分数？

English Version

Title: How to find the percentile scores of a numpy array?

Difficulty Level: L1

Question: Find the 5th and 95th percentile of iris’s sepal length.

难度：L1

问题：找出 iris sepal length（第一列）的第 5 个和第 95 个百分数。

1 2	url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" sepallength = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0])

Solution

1 2	>>> np.percentile(sepallength, [5, 95]) array([4.6 , 7.255])

32. 如何在数组的随机位置插入值？

English Version

Title: How to insert values at random positions in an array?

Difficulty Level: L2

Question: Insert np.nan values at 20 random positions in iris_2d dataset.

难度：L2

问题：在 iris_2d 数据集中的 20 个随机位置插入 np.nan 值。

输入：

1 2	>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" >>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=object)

Solution 1

>>> rand_row = np.random.randint(iris_2d.shape[0], size=20)
>>> rand_col = np.random.randint(iris_2d.shape[1], size=20)
>>> iris_2d[rand_row, rand_col] = np.nan
>>> iris_2d[:10]
array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
       [b'5.0', b'3.6', b'1.4', b'0.2', b'Iris-setosa'],
       [b'5.4', b'3.9', b'1.7', b'0.4', b'Iris-setosa'],
       [b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
       [b'5.0', b'3.4', b'1.5', b'0.2', b'Iris-setosa'],
       [b'4.4', b'2.9', nan, b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.1', b'1.5', b'0.1', b'Iris-setosa']], dtype=object)

Solution 2

>>> i, j = np.where(iris_2d)
>>> iris_2d[np.random.choice(i, 20), np.random.choice(j, 20)] = np.nan
>>> iris_2d[:10]
array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
       [b'5.0', b'3.6', b'1.4', b'0.2', b'Iris-setosa'],
       [b'5.4', b'3.9', b'1.7', b'0.4', b'Iris-setosa'],
       [b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
       [b'5.0', b'3.4', b'1.5', b'0.2', nan],
       [b'4.4', b'2.9', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.1', b'1.5', b'0.1', b'Iris-setosa']], dtype=object)

33. 如何在 NumPy 数组中找出缺失值的位置？

English Version

Title: How to find the position of missing values in numpy array?

Difficulty Level: L2

Question: Find the number and position of missing values in iris_2d‘s sepal length (1st column).

难度：L2

问题：在 iris_2d 的 sepal length（第一列）中找出缺失值的数目和位置。

输入：

1
2
3

>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0, 1, 2, 3])
>>> iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

Solution 1

# number of nan
>>> np.isnan(iris_2d[:, 0]).sum()
5

# index of nan
>>> np.where(np.isnan(iris_2d[:, 0]))
(array([ 12,  13,  47,  53, 143]),)

Solution 2

>>> nan_bools = np.isnan(iris_2d[:, 0])

# number of nan
>>> num_nans = np.sum(nan_bools)
>>> num_nans
5

# index of nan
>>> index = np.arange(len(nan_bools))
>>> nan_index = index[nan_bools]
>>> nan_index
array([ 12,  13,  47,  53, 143])

34. 如何基于两个或以上条件过滤 NumPy 数组？

English Version

Title: How to filter a numpy array based on two or more conditions?

Difficulty Level: L3

Question: Filter the rows of iris_2d that has petal length (3rd column) > 1.5 and sepal length (1st column) < 5.0.

难度：L3

问题：过滤 iris_2d 中满足 petal length（第三列）> 1.5 和 sepal length（第一列）< 5.0 的行。

输入：

1 2	>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" >>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0, 1, 2, 3])

Solution

>>> condition = (iris_2d[:, 2] > 1.5) & (iris_2d[:, 0] < 5.0)
>>> iris_2d[condition]
array([[4.8, 3.4, 1.6, 0.2],
       [4.8, 3.4, 1.9, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [4.9, 2.4, 3.3, 1. ],
       [4.9, 2.5, 4.5, 1.7]])

35. 如何在 NumPy 数组中删除包含缺失值的行？

English Version

Title: How to drop rows that contain a missing value from a numpy array?

Difficulty Level: L3:

Question: Select the rows of iris_2d that does not have any nan value.

难度：L3

问题：选择 iris_2d 中不包含 nan 值的行。

输入：

1
2
3

>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0, 1, 2, 3])
>>> iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

Solution 1

>>> iris_2d[np.sum(np.isnan(iris_2d), axis=1) == 0][:5]
array([[5.1, 3.5, 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4]])

Solution 2

>>> any_nan_in_row = np.array([~np.any(np.isnan(row)) for row in iris_2d])
>>> iris_2d[any_nan_in_row][:5]
array([[5.1, 3.5, 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4]])

36. 如何找出 NumPy 数组中两列之间的关联性？

English Version

Title: How to find the correlation between two columns of a numpy array?

Difficulty Level: L2

Question: Find the correlation between sepal length(1st column) and petal length(3rd column) in iris_2d.

难度：L2

问题：找出 iris_2d 中 sepal length（第一列）和 petal length（第三列）之间的关联性。

输入：

1 2	>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" >>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0, 1, 2, 3])

Solution 1

1 2	>>> np.corrcoef(iris_2d[:, 0], iris_2d[:, 2])[0, 1] 0.8717541573048718

Solution 2

>>> from scipy.stats.stats import pearsonr
>>> corr, p_value = pearsonr(iris_2d[:, 0], iris_2d[:, 2])
>>> corr
0.8717541573048712

37. 如何确定给定数组是否有空值？

English Version

Title: How to find if a given array has any null values?

Difficulty Level: L2

Question: Find out if iris_2d has any missing values.

难度：L2

问题：确定 iris_2d 是否有缺失值。

输入：

1 2	>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" >>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0, 1, 2, 3])

Solution 1

1 2	>>> np.sum(np.isnan(iris_2d)) > 0 False

Solution 2

1 2	>>> np.isnan(iris_2d).any() False

38. 如何在 NumPy 数组中将所有缺失值替换成0？

English Version

Title: How to replace all missing values with 0 in a numpy array?

Difficulty Level: L2

Question: Replace all ccurrences of nan with 0 in numpy array.

难度：L2

问题：在 NumPy 数组中将所有 nan 替换成 0。

输入：

1
2
3

>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=float, usecols=[0, 1, 2, 3])
>>> iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

Solution

1	>>> iris_2d[np.isnan(iris_2d)] = 0

39. 如何在 NumPy 数组中找出唯一值的数量？

English Version

Title: How to find the count of unique values in a numpy array?

Difficulty Level: L2

Question: Find the unique values and the count of unique values in iris’s species.

难度：L2

问题：在 iris 的 species 列中找出唯一值及其数量。

输入：

1
2
3

>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype=object)
>>> names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")

Solution

>>> unique, counts = np.unique(iris[:, 4], return_counts=True)
>>> unique
array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
      dtype=object)
>>> counts
array([50, 50, 50])

40. 如何将一个数值转换为一个类别（文本）数组？

English Version

Title: How to convert a numeric to a categorical (text) array?

Difficulty Level: L2

Question: Bin the petal length (3rd) column of iris_2d to form a text array, such that if petal length is:

1
2
3

Less than 3 --> 'small'
3-5 --> 'medium'
>=5 --> 'large'

难度：L2

问题：将 iris_2d 的 petal length（第三列）转换以构建一个文本数组，按如下规则进行转换：

1
2
3

Less than 3 –> 'small'
3-5 –> 'medium'
>=5 –> 'large'

输入：

1
2
3

>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype=object)
>>> names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")

Solution 1

# Bin petallength 
>>> petal_length_bin = np.digitize(iris[:, 2].astype(float), [0, 3, 5, 10])

# Map it to respective category
>>> label_map = {1: "small", 2: "medium", 3: "large", 4: np.nan}
>>> petal_length_cat = [label_map[x] for x in petal_length_bin]

# View
>>> petal_length_cat[:4]
['small', 'small', 'small', 'small']

Solution 2

>>> petal_length = iris[:, 2].astype(float)
>>> petal_length_cat = np.full(len(petal_length), None,dtype=object)

>>> petal_length_cat[petal_length < 3] = "small"
>>> petal_length_cat[(petal_length >= 3) & (petal_length < 5)] = "medium"
>>> petal_length_cat[petal_length >= 5] = "large"

>>> petal_length_cat[:4]
array(['small', 'small', 'small', 'small'], dtype=object)

41. 如何基于 NumPy 数组现有列创建一个新的列？

English Version

Title: How to create a new column from existing columns of a numpy array?

Difficulty Level: L2

Question: Create a new column for volume in iris_2d, where volume is (pi x petallength x sepal_length^2)/3.

难度：L2

问题：为 iris_2d 中的 volume 列创建一个新的列，volume 指 (pi x petal_length x sepal_length^2)/3。

输入：

1
2
3

>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris_2d = np.genfromtxt(url, delimiter=",", dtype=object)
>>> names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")

Solution 1

>>> volume = (np.pi * iris_2d[:, 2].astype(float) * (iris_2d[:, 0].astype(float))**2) / 3
>>> out = np.c_[iris_2d, volume]
>>> out[:4]
array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa',
        38.13265162927291],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa',
        35.200498485922445],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa', 30.0723720777127],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa',
        33.238050274980004]], dtype=object)

Solution 2

# Compute volume
>>> sepal_length = iris_2d[:, 0].astype('float')
>>> petal_length = iris_2d[:, 2].astype('float')
>>> volume = (np.pi * petal_length * (sepal_length**2))/3

# Introduce new dimension to match iris_2d's
>>> volume = volume[:, np.newaxis]

# Add the new column
>>> out = np.hstack([iris_2d, volume])

# View
>>> out[:4]
array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa',
        38.13265162927291],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa',
        35.200498485922445],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa', 30.0723720777127],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa',
        33.238050274980004]], dtype=object)

42. 如何在 NumPy 中执行概率采样？

English Version

Title: How to do probabilistic sampling in numpy?

Difficulty Level: L3

Question: Randomly sample iris’s species such that setosa is twice the number of versicolor and virginica.

难度：L3

问题：随机采样 iris 数据集中的 species 列，使得 setosa 的数量是 versicolor 和 virginica 数量的两倍。

1
2
3

# Import iris keeping the text column intact
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
iris = np.genfromtxt(url, delimiter=",", dtype=object)

Solution

# Get the species column
>>> species = iris[:, 4]

# Probablistic Sampling
>>> np.random.seed(100)
>>> probs = np.r_[np.linspace(0, 0.500, num=50), np.linspace(0.501, 0.750, num=50), np.linspace(0.751, 1.0, num=50)]
>>> index = np.searchsorted(probs, np.random.random(150))
>>> species_out = species[index]
>>> np.unique(species_out, return_counts=True)
(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
      dtype=object), array([77, 37, 36]))

43. 如何在多维数组中找到一维的第二最大值？

English Version

Title: How to get the second largest value of an array when grouped by another array?

Difficulty Level: L2

Question: What is the value of second longest petal length of species setosa

难度：L2

问题：在 species setosa 的 petal length 列中找到第二最大值。

输入：

1
2
3

>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype=object)
>>> names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")

Solution

>>> iris_setosa = iris[iris[:, 4] == b"Iris-setosa", :]
>>> petal_len_setosa = iris_setosa[:, 2].astype(float)
>>> second_large = np.sort(np.unique(petal_len_setosa))[-2]
>>> second_large
1.7

44. 如何用给定列将 2 维数组排序？

English Version

Title: How to sort a 2D array by a column

Difficulty Level: L2

Question: Sort the iris dataset based on sepal length column.

难度：L2

问题：基于 sepal length 列将 iris 数据集排序。

输入：

1
2
3

>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype=object)
>>> names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")

Solution

>>> index = np.argsort(iris[:, 0])
>>> iris_sort = iris[index]
>>> iris_sort[:10]
array([[b'4.3', b'3.0', b'1.1', b'0.1', b'Iris-setosa'],
       [b'4.4', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.4', b'3.0', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.4', b'2.9', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.5', b'2.3', b'1.3', b'0.3', b'Iris-setosa'],
       [b'4.6', b'3.6', b'1.0', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
       [b'4.6', b'3.2', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa']], dtype=object)

45. 如何在 NumPy 数组中找到最频繁出现的值？

English Version

Title: How to find the most frequent value in a numpy array?

Difficulty Level: L1

Question: Find the most frequent value of petal length (3rd column) in iris dataset.

难度：L1

问题：在 iris 数据集中找到 petal length（第三列）中最频繁出现的值。

输入：

1
2
3

>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> iris = np.genfromtxt(url, delimiter=",", dtype= object)
>>> names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")

Solution

1
2
3

>>> uniques, counts = np.unique(iris[:, 2], return_counts=True)
>>> uniques[np.argmax(counts)]
b'1.5'

46. 如何找到第一个大于给定值的数的位置？

English Version

Title: How to find the position of the first occurrence of a value greater than a given value?

Difficulty Level: L2

Question: Find the position of the first occurrence of a value greater than 1.0 in petal width 4th column of iris dataset.

难度：L2

问题：在 iris 数据集的 petal width（第四列）中找到第一个值大于 1.0 的数的位置。

输入：

1 2	>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" >>> iris = np.genfromtxt(url, delimiter=",", dtype=object)

Solution 1

>>> np.argwhere(iris[:, 3].astype(float) > 1.0)[0][0]
50
>>> np.where(iris[:, 3].astype(float) > 1.0)[0][0]
50

Solution 2

>>> index = np.arange(len(iris))
>>> index = index[iris[:, 3].astype(float) > 1.0]
>>> index[0]
50

47. 如何将数组中所有大于给定值的数替换为给定的 cutoff 值？

English Version

Title: How to replace all values greater than a given value to a given cutoff?

Difficulty Level: L2

Question: From the array a, replace all values greater than 30 to 30 and less than 10 to 10.

难度：L2

问题：对于数组 a，将所有大于 30 的值替换为 30，将所有小于 10 的值替换为 10。

输入：

1 2	>>> np.random.seed(100) >>> a = np.random.uniform(1, 50, 20)

Solution 1

# Cutoff in-place
>>> a[a > 30] = 30
>>> a[a < 10] = 10
>>> a[:5]
array([27.62684215, 14.64009987, 21.80136195, 30.        , 10.        ])

Solution 2

1
2
3

>>> a_cutoff = np.clip(a, a_min=10, a_max=30)
>>> a_cutoff[:5]
array([27.62684215, 14.64009987, 21.80136195, 30.        , 10.        ])

Solution 3

1
2
3

>>> a_cutoff = np.where(a < 10, 10, np.where(a > 30, 30, a))
>>> a_cutoff[:5]
array([27.62684215, 14.64009987, 21.80136195, 30.        , 10.        ])

48. 如何在 NumPy 数组中找到 top-n 数值的位置？

English Version

Title: How to get the positions of top n values from a numpy array?

Difficulty Level: L2

Question: Get the positions of top 5 maximum values in a given array a.

难度：L2

问题：在给定数组 a 中找到 top-5 最大值的位置。

输入：

>>> np.random.seed(100)
>>> a = np.random.uniform(1, 50, 20)
>>> a
array([27.62684215, 14.64009987, 21.80136195, 42.39403048,  1.23122395,
        6.95688692, 33.86670515, 41.466785  ,  7.69862289, 29.17957314,
       44.67477576, 11.25090398, 10.08108276,  6.31046763, 11.76517714,
       48.95256545, 40.77247431,  9.42510962, 40.99501269, 14.42961361])

Solution 1

1
2
3

>>> index = np.argsort(a)[::-1]
>>> index[:5]
array([15, 10,  3,  7, 18])

Solution 2

# Assume each element in array `a` is nonnegative
>>> index = np.argpartition(-a, 5)
>>> index[:5]
array([15, 10,  3,  7, 18])

49. 如何逐行计算数组中所有值的数量？

English Version

Title: How to compute the row wise counts of all possible values in an array?

Difficulty Level: L4

Question: Compute the counts of unique values row-wise.

难度：L4

问题：逐行计算唯一值的数量。

输入：

>>> np.random.seed(100)
>>> arr = np.random.randint(1, 11, size=(6, 10))
>>> arr
array([[ 9,  9,  4,  8,  8,  1,  5,  3,  6,  3],
       [ 3,  3,  2,  1,  9,  5,  1, 10,  7,  3],
       [ 5,  2,  6,  4,  5,  5,  4,  8,  2,  2],
       [ 8,  8,  1,  3, 10, 10,  4,  3,  6,  9],
       [ 2,  1,  8,  7,  3,  1,  9,  3,  6,  2],
       [ 9,  2,  6,  5,  3,  9,  4,  6,  1, 10]])

期望输出：

[[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
 [2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
 [0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
 [1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
 [2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
 [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]

输出包含 10 个列，表示从 1 到 10 的数字。这些数值分别代表每一行的计数数量。例如，Cell(0, 2) 中有值 2，这意味着，数字 3 在第一行出现了两次。

Solution 1

# Assume each number is in [1, 10]
>>> results = []
>>> for row in arr:
...     uniques, counts = np.unique(row, return_counts=True)
...     zeros = np.zeros(10, dtype=int)
...     zeros[uniques-1] = counts
...     results.append(zeros.tolist())
...
>>> np.array(results)
array([[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
       [2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
       [0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
       [1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
       [2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]])

Solution 2

# More general
>>> def counts_of_all_values_rowwise(arr2d):
...     # Unique values and its counts row wise
...     return([[int(b[a==i]) if i in a else 0 for i in np.unique(arr2d)] for a, b in num_counts_array])
...
>>> np.array(counts_of_all_values_rowwise(arr))
array([[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
       [2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
       [0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
       [1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
       [2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]])

50. 如何将 array_of_arrays 转换为平面 1 维数组？

English Version

Title: How to convert an array of arrays into a flat 1d array?

Difficulty Level: 2

Question: Convert array_of_arrays into a flat linear 1d array.

难度：L2

问题：将 array_of_arrays 转换为平面线性 1 维数组。

输入：

>>> arr1 = np.arange(3)
>>> arr2 = np.arange(3, 7)
>>> arr3 = np.arange(7, 10)
>>> array_of_arrays = np.array([arr1, arr2, arr3])
>>> array_of_arrays
array([array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])],
      dtype=object)

期望输出：

1	array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Solution 1

1
2
3

>>> arr2d = np.concatenate([arr for arr in array_of_arrays])
>>> arr2d
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Solution 2

1
2
3

>>> arr2d = np.array([a for arr in array_of_arrays for a in arr])
>>> arr2d
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

51. 如何为 NumPy 数组生成 one-hot 编码？

English Version

Title: How to generate one-hot encodings for an array in numpy?

Difficulty Level L4

Question: Compute the one-hot encodings (dummy binary variables for each unique value in the array).

难度：L4

问题：计算 one-hot 编码。

输入：

>>> np.random.seed(101)
>>> arr = np.random.randint(1, 4, size=6)
>>> arr
array([2, 3, 2, 2, 2, 1])

期望输出：

array([[0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.]])

Solution 1

>>> arr_shift = arr - 1
>>> one_hot = np.eye(3)[arr_shift]
>>> one_hot
array([[0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.]])

Solution 2

>>> def one_hot_encodings(arr):
...     uniqs = np.unique(arr)
...     out = np.zeros((arr.shape[0], uniqs.shape[0]))
...     for i, k in enumerate(arr):
...         out[i, k-1] = 1
...     return out
...
>>> one_hot_encodings(arr)
array([[0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.]])

52. 如何创建由类别变量分组确定的一维数值？

English Version

Title: How to create row numbers grouped by a categorical variable?

Difficulty Level: L3

Question: Create row numbers grouped by a categorical variable. Use the following sample from iris species as input.

难度：L3

问题：创建由类别变量分组的行数。使用以下来自 iris species 的样本作为输入。

输入：

>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> species = np.genfromtxt(url, delimiter=",", dtype=str, usecols=4)
>>> np.random.seed(100)
>>> species_small = np.sort(np.random.choice(species, size=20))
>>> species_small
array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica'], dtype='<U15')

期望输出：

1	[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5]

Solution 1

>>> groups = []
>>> for val in np.unique(species_small):
...     groups.append(np.arange(len(species_small[species_small == val])))
...
>>> np.concatenate(groups).tolist()
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5]

Solution 2

1 2	>>> [i for val in np.unique(species_small) for i, grp in enumerate(species_small[species_small==val])] [0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5]

53. 如何基于给定的类别变量创建分组 id？

English Version

Title: How to create groud ids based on a given categorical variable?

Difficulty Level: L4

Question: Create group ids based on a given categorical variable. Use the following sample from iris species as input.

难度：L4

问题：基于给定的类别变量创建分组 id。使用以下来自 iris species 的样本作为输入。

输入：

>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
>>> species = np.genfromtxt(url, delimiter=",", dtype=str, usecols=4)
>>> np.random.seed(100)
>>> species_small = np.sort(np.random.choice(species, size=20))
>>> species_small
array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica'], dtype='<U15')

期望输出：

1	[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]

Solution

>>> output = np.full(len(species_small), 0)
>>> uniques = np.unique(species_small)
>>> for val in uniques:
...     group_id = np.where(uniques == val)[0][0]
...     output[species_small == val] = group_id
...
>>> output.tolist()
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]

54. 如何使用 NumPy 对数组中的项进行排序？

English Version

Title: How to rank items in an array using numpy?

Difficulty Level: L2

Question: Create the ranks for the given numeric array a.

难度：L2

问题：为给定的数值数组 a 创建排序。

输入：

>>> np.random.seed(10)
>>> a = np.random.randint(20, size=10)
>>> a
array([ 9,  4, 15,  0, 17, 16, 17,  8,  9,  0])

期望输出：

1	array([4, 2, 6, 0, 8, 7, 9, 3, 5, 1])

Solution

1 2	>>> np.argsort(np.argsort(a)) array([4, 2, 6, 0, 8, 7, 9, 3, 5, 1])

55. 如何使用 NumPy 对多维数组中的项进行排序？

English Version

Title: How to rank items in a multidimensional array using numpy?

Difficulty Level: L3

Question: Create a rank array of the same shape as a given numeric array a.

难度：L3

问题：给出一个数值数组 a，创建一个形态相同的排序数组。

输入：

>>> np.random.seed(10)
>>> a = np.random.randint(20, size=[2, 5])
>>> a
array([[ 9,  4, 15,  0, 17],
       [16, 17,  8,  9,  0]])

期望输出：

1 2	array([[4, 2, 6, 0, 8], [7, 9, 3, 5, 1]])

Solution 1

>>> a_flat = a.flatten()
>>> sort_idx = np.argsort(np.argsort(a_flat))
>>> sort_idx.reshape((2, -1))
array([[4, 2, 6, 0, 8],
       [7, 9, 3, 5, 1]])

Solution 2

1
2
3

>>> a.ravel().argsort().argsort().reshape(a.shape)
array([[4, 2, 6, 0, 8],
       [7, 9, 3, 5, 1]])

56. 如何在 2 维 NumPy 数组中找到每一行的最大值？

English Version

Title: How to find the maximum value in each row of a numpy array 2d?

Difficulty Level: L2

Question: Compute the maximum for each row in the given array.

难度：L2

问题：在给定数组中找到每一行的最大值。

>>> np.random.seed(100)
>>> a = np.random.randint(1, 10, [5, 3])
>>> a
array([[9, 9, 4],
       [8, 8, 1],
       [5, 3, 6],
       [3, 3, 3],
       [2, 1, 9]])

Solution 1

1 2	>>> np.amax(a, axis=1) array([9, 8, 6, 3, 9])

Solution 2

1 2	>>> np.apply_along_axis(np.max, arr=a, axis=1) array([9, 8, 6, 3, 9])

57. 如何计算 2 维 NumPy 数组每一行的 min-by-max？

English Version

Title: How to compute the min-by-max for each row for a numpy array 2d?

Difficulty Level: L3

Question: Compute the min-by-max for each row for given 2d numpy array.

难度：L3

问题：给定一个 2 维 NumPy 数组，计算每一行的 min-by-max。

>>> np.random.seed(100)
>>> a = np.random.randint(1, 10, [5, 3])
>>> a
array([[9, 9, 4],
       [8, 8, 1],
       [5, 3, 6],
       [3, 3, 3],
       [2, 1, 9]])

Solution

1 2	>>> np.apply_along_axis(lambda x: np.min(x)/np.max(x), axis=1, arr=a) array([0.44444444, 0.125 , 0.5 , 1. , 0.11111111])

58. 如何在 NumPy 数组中找到重复条目？

English Version

Title: How to find the duplicate records in a numpy array?

Difficulty Level: L3

Question: Find the duplicate entries (2nd occurrence onwards) in the given numpy array and mark them as True. First time occurrences should be False.

难度：L3

问题：在给定的 NumPy 数组中找到重复条目（从第二次出现开始），并将其标记为 True。第一次出现的条目需要标记为 False。

输入：

>>> np.random.seed(100)
>>> a = np.random.randint(0, 5, 10)
>>> a
array([0, 0, 3, 0, 2, 4, 2, 2, 2, 2])

期望输出：

1 2	array([False, True, False, True, False, False, True, True, True, True])

Solution

>>> out = np.full(a.shape[0], True)
>>> unique_positions = np.unique(a, return_index=True)[1]
>>> out[unique_positions] = False
>>> out
array([False,  True, False,  True, False, False,  True,  True,  True,
        True])

59. 如何找到 NumPy 的分组平均值？

English Version

Title: How to find the grouped mean in numpy?

Difficulty Level L3

Question: Find the mean of a numeric column grouped by a categorical column in a 2D numpy array.

难度：L3

问题：在 2 维 NumPy 数组的类别列中找到数值 sepal length 的平均值。

输入：

1
2
3

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
iris = np.genfromtxt(url, delimiter=",", dtype=object)
names = ("sepallength", "sepalwidth", "petallength", "petalwidth", "species")

期望输出：

1
2
3

[[b'Iris-setosa', 3.418],
 [b'Iris-versicolor', 2.770],
 [b'Iris-virginica', 2.974]]

Solution

>>> uniques = np.unique(iris[:, 4])
>>> output = []
>>> for v in uniques:
...     group = iris[iris[:, 4] == v]
...     output.append([v, np.mean(group[:, 1].astype(float))])
...
>>> output
[[b'Iris-setosa', 3.418], [b'Iris-versicolor', 2.7700000000000005], [b'Iris-virginica', 2.974]]

60. 如何将 PIL 图像转换成 NumPy 数组？

English Version

Title: How to convert a PIL image to numpy array?

Difficulty Level: L3

Question: Import the image from the following url and convert it to a numpy array.

难度：L3

问题：从以下 url 中导入图像，并将其转换成 NumPy 数组。

1	>>> url = "https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg"

Solution

>>> import requests
>>> from io import BytesIO
>>> from PIL import Image
>>> response = requests.get(url)
>>> img = Image.open(BytesIO(response.content))
>>> img_arr = np.asarray(img)
>>> img_arr[:5, :5]
array([[[  9,  72, 125],
        [  9,  72, 125],
        [  9,  72, 125],
        [ 10,  73, 126],
        [ 10,  73, 126]],

       [[  9,  72, 125],
        [  9,  72, 125],
        [ 10,  73, 126],
        [ 10,  73, 126],
        [ 10,  73, 126]],

       [[  9,  72, 125],
        [ 10,  73, 126],
        [ 10,  73, 126],
        [ 10,  73, 126],
        [ 11,  74, 127]],

       [[ 10,  73, 126],
        [ 10,  73, 126],
        [ 10,  73, 126],
        [ 11,  74, 127],
        [ 11,  74, 127]],

       [[ 10,  73, 126],
        [ 10,  73, 126],
        [ 11,  74, 127],
        [ 11,  74, 127],
        [ 11,  74, 127]]], dtype=uint8)

61. 如何删除 NumPy 数组中所有的缺失值？

English Version

Title: How to drop all missing values from a numpy array?

Difficulty Level: L2

Question: Drop all nan values from a 1D numpy array.

难度：L2

问题：从 1 维 NumPy 数组中删除所有的 nan 值。

输入：

1
2
3

>>> arr = np.array([1, 2, 3, np.nan, 5, 6, 7, np.nan])
>>> arr
array([ 1.,  2.,  3., nan,  5.,  6.,  7., nan])

期望输出：

1	array([1., 2., 3., 5., 6., 7.])

Solution

1 2	>>> arr[~np.isnan(arr)] array([1., 2., 3., 5., 6., 7.])

62. 如何计算两个数组之间的欧几里得距离？

English Version

Title: How to compute the euclidean distance between two arrays?

Difficulty Level: L1

Question: Compute the euclidean distance between two arrays a and b.

难度：L1

问题：计算两个数组 a 和 b 之间的欧几里得距离。

输入：

>>> a = np.array([1, 2, 3, 4, 5])
>>> b = np.array([4, 5, 6, 7, 8])
>>> a
array([1, 2, 3, 4, 5])
>>> b
array([4, 5, 6, 7, 8])

Solution 1

1 2	>>> np.sqrt(np.sum((a-b)**2)) 6.708203932499369

Solution 2

1 2	>>> np.linalg.norm(a-b) 6.708203932499369

63. 如何在一个 1 维数组中找到所有的局部极大值（peak）？

English Version

Title: How to find all the local maxima (or peaks) in a 1d array?

Difficulty Level: L4

Question: Find all the peaks in a 1D numpy array a. Peaks are points surrounded by smaller values on both sides.

难度：L4

问题：在 1 维数组 a 中找到所有的 peak。peak 是指一个数字比两侧的数字都大。

输入：

1
2
3

>>> a = np.array([1, 3, 7, 1, 2, 6, 0, 1])
>>> a
array([1, 3, 7, 1, 2, 6, 0, 1])

期望输出：

1	array([2, 5])

其中 2 和 5 是局部最大值 7 和 6 的下标。

Solution

>>> double_diff = np.diff(np.sign(np.diff(a)))
>>> peak_locations = np.where(double_diff == -2)[0] + 1
>>> peak_locations
array([2, 5])

64. 如何从 2 维数组中减去 1 维数组，从 2 维数组的每一行分别减去 1 维数组的每一项？

English Version

Title: How to subtract a 1d array from a 2d array, where each item of 1d array subtracts from respective row?

Difficulty Level: L2

Question: Subtract the 1d array b_1d from the 2d array a_2d, such that each item of b_1d subtracts from respective row of a_2d.

难度：L2

问题：从 2 维数组 a_2d 中减去 1 维数组 b_1d，即从 a_2d 的每一行分别减去 b_1d 的每一项。

输入：

>>> a_2d = np.array([[3, 3, 3],[4, 4, 4],[5, 5, 5]])
>>> b_1d = np.array([1, 2, 3])
>>> a_2d
array([[3, 3, 3],
       [4, 4, 4],
       [5, 5, 5]])
>>> b_1d
array([1, 2, 3])

期望输出：

1
2
3

array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])

Solution

>>> a_2d - b_1d[:, np.newaxis]
array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])

65. 如何在数组中找出某个项的第 n 个重复索引？

English Version

Title: How to find the index of n’th repetition of an item in an array

Difficulty Level L2

Question: Find the index of 5th repetition of number 1 in x.

难度：L2

问题：找到数组 x 中数字 1 的第 5 个重复索引。

输入：

1	>>> x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])

Solution 1

1
2
3

>>> n = 5
>>> [i for i, v in enumerate(x) if v == 1][n-1]
8

Solution 2

>>> n = 5
>>> index = np.arange(len(x))
>>> index[x == 1][n-1]
8

Solution 3

1
2
3

>>> n = 5
>>> np.where(x == 1)[0][n-1]
8

66. 如何将 NumPy 的 datetime64 对象（object）转换为 datetime 的 datetime 对象？

English Version

Title: How to convert numpy’s datetime64 object to datetime’s datetime object?

Difficulty Level: L2

Question: Convert numpy’s datetime64 object to datetime’s datetime object.

难度：L2

问题：将 NumPy 的 datetime64 对象转换为 datetime 的 datetime 对象。

1 2	# Input: a numpy datetime64 object >>> dt64 = np.datetime64("2018-02-25 22:10:10")

Solution 1

1 2	>>> dt64.tolist() datetime.datetime(2018, 2, 25, 22, 10, 10)

Solution 2

1
2
3

>>> from datetime import datetime
>>> dt64.astype(datetime)
datetime.datetime(2018, 2, 25, 22, 10, 10)

67. 如何计算 NumPy 数组的移动平均数？

English Version

Title: How to compute the moving average of a numpy array?

Difficulty Level: L3

Question: Compute the moving average of window size 3, for the given 1D array.

难度：L3

问题：给定 1 维数组，计算 window size 为 3 的移动平均数。

输入：

>>> np.random.seed(100)
>>> Z = np.random.randint(10, size=10)
>>> Z
array([8, 8, 3, 7, 7, 0, 4, 2, 5, 2])

Solution 1

Source: How to calculate moving average using NumPy?

>>> def moving_average(a, n=3):
...     ret = np.cumsum(a, dtype=float)
...     ret[n:] = ret[n:] - ret[:-n]
...     return ret[n-1:] / n
...
>>> moving_average(Z, n=3).round(2)
array([6.33, 6.  , 5.67, 4.67, 3.67, 2.  , 3.67, 3.  ])

Solution 2

1 2	>>> np.convolve(Z, np.ones(3)/3, mode="valid").round(2) array([6.33, 6. , 5.67, 4.67, 3.67, 2. , 3.67, 3. ])

68. 给定起始数字、length 和步长，如何创建一个 NumPy 数组序列？

English Version

Title: How to create a numpy array sequence given only the starting point, length and the step?

Difficulty Level: L2

Question: Create a numpy array of length 10, starting from 5 and has a step of 3 between consecutive numbers.

难度：L2

问题：从 5 开始，创建一个 length 为 10 的 NumPy 数组，相邻数字的差是 3。

Solution 1

>>> def seq(start, length, step):
...     end = start + (step*length)
...     return np.arange(start, end, step)
...
>>> seq(5, 10, 3)
array([ 5,  8, 11, 14, 17, 20, 23, 26, 29, 32])

Solution 2

1 2	>>> np.arange(5, 5+3*10, 3) array([ 5, 8, 11, 14, 17, 20, 23, 26, 29, 32])

69. 如何在不规则 NumPy 日期序列中填充缺失日期？

English Version

Title: How to fill in missing dates in an irregular series of numpy dates?

Difficulty Level: L3

Question: Given an array of a non-continuous sequence of dates. Make it a continuous sequence of dates, by filling in the missing dates.

难度：L3

问题：给定一个非连续日期序列的数组，通过填充缺失的日期，使其变成连续的日期序列。

输入：

>>> dates = np.arange(np.datetime64("2018-02-01"), np.datetime64("2018-02-25"), 2)
>>> dates
array(['2018-02-01', '2018-02-03', '2018-02-05', '2018-02-07',
       '2018-02-09', '2018-02-11', '2018-02-13', '2018-02-15',
       '2018-02-17', '2018-02-19', '2018-02-21', '2018-02-23'],
      dtype='datetime64[D]')

Solution 1

>>> out = []
>>> for date, d in zip(dates, np.diff(dates)):
...     out.append(np.arange(date, (date+d)))
...
>>> filled_in = np.array(out).reshape(-1)
>>> output = np.hstack([filled_in, dates[-1]])
>>> output
array(['2018-02-01', '2018-02-02', '2018-02-03', '2018-02-04',
       '2018-02-05', '2018-02-06', '2018-02-07', '2018-02-08',
       '2018-02-09', '2018-02-10', '2018-02-11', '2018-02-12',
       '2018-02-13', '2018-02-14', '2018-02-15', '2018-02-16',
       '2018-02-17', '2018-02-18', '2018-02-19', '2018-02-20',
       '2018-02-21', '2018-02-22', '2018-02-23'], dtype='datetime64[D]')

Solution 2

>>> filled_in = np.array([np.arange(date, (date+d)) for date, d in zip(dates, np.diff(dates))]).reshape(-1)
>>> output = np.hstack([filled_in, dates[-1]])
>>> output
array(['2018-02-01', '2018-02-02', '2018-02-03', '2018-02-04',
       '2018-02-05', '2018-02-06', '2018-02-07', '2018-02-08',
       '2018-02-09', '2018-02-10', '2018-02-11', '2018-02-12',
       '2018-02-13', '2018-02-14', '2018-02-15', '2018-02-16',
       '2018-02-17', '2018-02-18', '2018-02-19', '2018-02-20',
       '2018-02-21', '2018-02-22', '2018-02-23'], dtype='datetime64[D]')

70. 如何基于给定的 1 维数组创建 strides？

English Version

Title: How to create strides from a given 1D array?

Difficulty Level: L4

Question: From the given 1d array arr, generate a 2d matrix using strides, with a window length of 4 and strides of 2, like [[0,1,2,3], [2,3,4,5], [4,5,6,7]..]

难度：L4

问题：给定 1 维数组 arr，使用 strides 生成一个 2 维矩阵，其中 window length 等于 4，strides 等于 2，例如 [[0,1,2,3], [2,3,4,5], [4,5,6,7]..]。

输入：

1
2
3

>>> arr = np.arange(15)
>>> arr
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

期望输出：

array([[ 0,  1,  2,  3],
       [ 2,  3,  4,  5],
       [ 4,  5,  6,  7],
       [ 6,  7,  8,  9],
       [ 8,  9, 10, 11],
       [10, 11, 12, 13]])

Solution

>>> def gen_strides(a, stride_len=5, window_len=5):
...     n_strides = ((a.size - window_len) // stride_len) + 1
...     return np.array([a[s:(s+window_len)] for s in np.arange(0, n_strides*stride_len, stride_len)])
...
>>> gen_strides(np.arange(15), stride_len=2, window_len=4)
array([[ 0,  1,  2,  3],
       [ 2,  3,  4,  5],
       [ 4,  5,  6,  7],
       [ 6,  7,  8,  9],
       [ 8,  9, 10, 11],
       [10, 11, 12, 13]])

References

Python Iterable Container Implementation

Posted on 2020-01-12 Edited on 2025-03-02

要实现一个可遍历的容器，需要两步：

容器继承collection.abc.Iterable，然后实现__iter__()方法
实现一个Iterator类，继承自collection.abc.Iterator，然后实现__next__()方法

实现代码如下：

from collections.abc import Iterator, Iterable
from typing import Any


class ConcreteIterator(Iterator):
    def __init__(self, collection):
        self._collection = collection
        self._position = 0

    def __next__(self):
        try:
            value = self._collection._container[self._position]
            self._position = self._position + 1
        except IndexError:
            raise StopIteration()

        return value


class ConcreteCollection(Iterable):
    def __init__(self):
        self._container = []

    def __iter__(self):
        return ConcreteIterator(self)

    def add_item(self, item: Any):
        self._container.append(item)


if __name__ == "__main__":
    collection = ConcreteCollection()
    collection.add_item('Hello')
    collection.add_item('Wolrd,')
    collection.add_item('Python.')

    for item in collection:
        print('{} '.format(item), end='')
    print('')