袋熊的树洞

日拱一卒,功不唐捐

0%

Introduction

C++ Snippets

A string tokenizer

字符串分隔

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <string>
#include <vector>
#include <iostream>

using namespace std;

void tokenize(const string &str, vector<string> &tokens, const string &delimiters = " ")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);

while (string::npos != pos || string::npos != lastPos) {
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}

int main()
{
vector<string> tokens;
string str("Split me up! Word1 Word2 Word3.");

tokenize(str, tokens);

cout << "String: " << str << endl;
cout << "Tokens: " << endl;
for (auto it = tokens.begin(); it != tokens.end(); it++)
cout << *it << endl;
return 0;
}

问题描述

Given two arrays, write a function to compute their intersection.

Example:

Given nums1 = [1, 2, 2, 1], nums2 = [2, 2], return [2].

Note:

  • Each element in the result must be unique.

  • The result can be in any order.

Related Topics: Hash Table, Two Pointers, Binary Search, Sort

原问题: 349. Intersection of Two Arrays

中文翻译版: 349. 两个数组的交集

解决方案

方案1

可以创建两个哈希表,首先遍历数组 nums1,将数组元素添加到第一个哈希表中,然后遍历数组 nums2,如果数组元素在第一个哈希表中不存在,则将该数组元素添加到第二个哈希表中,遍历完数组 nums2 后,将第二个哈希表的中元素作为结果输出。实现代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <iostream>
#include <vector>
#include <unordered_set>
using namespace std;

class Solution {
public:
vector<int> intersection(vector<int>& nums1, vector<int>& nums2)
{
unordered_set<int> unique_nums1;
for (auto it=nums1.begin(); it!=nums1.end(); it++)
unique_nums1.insert(*it);

unordered_set<int> unique_nums;
for (auto it=nums2.begin(); it!=nums2.end(); it++) {
if (unique_nums1.find(*it) != unique_nums1.end())
unique_nums.insert(*it);
}

vector<int> ret;
for (auto it=unique_nums.begin(); it!=unique_nums.end(); it++)
ret.push_back(*it);

return ret;
}
};

int main()
{
Solution solu;
vector<int> nums1 = {1, 2, 2, 1};
vector<int> nums2 = {2, 2};
vector<int> ret = solu.intersection(nums1, nums2);

for (auto it=ret.begin(); it!=ret.end(); it++)
cout << *it << " ";
cout << endl;

return 0;
}

方案2

将数组 nums1 和数组 nums2 进行从小到大排序,设置两个指针 p1 和 p2 分别指向数组 nums1 和数组 nums2 的第一个元素。开始遍历数组时,如果指针 p1 和指针 p2 指向的元素不相等,则移动指向元素中较小值的指针,如果指针 p1 和指针 p2 指向的元素相等,则同时移动指针 p1 和 p2,并记录该相等的元素 (找到相等元素时先判断该元素是否和这个记录值相等,如果相等则说明交集已经存在该元素,不需要添加到交集中,否则添加到交集中),依次操作,直到其中一个集合没有元素可比较为止。实现代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <algorithm>
#include <vector>
#include <iostream>
using namespace std;

class Solution {
public:
vector<int> intersection(vector<int>& nums1, vector<int>& nums2)
{
vector<int> ret;
if (nums1.size() == 0)
return ret;
else if (nums2.size() == 0)
return ret;

sort(nums1.begin(), nums1.end());
sort(nums2.begin(), nums2.end());

int p1, p2, repeat;
p1 = p2 = 0;
repeat = nums1[0] < nums2[0] ? nums1[0] - 1 : nums2[0] - 1;
while (p1 < nums1.size() && p2 < nums2.size()) {
if (nums1[p1] < nums2[p2]) {
p1++;
} else if (nums1[p1] > nums2[p2]) {
p2++;
} else {
if (repeat != nums1[p1])
ret.push_back(nums1[p1]);
repeat = nums1[p1];
p1++;
p2++;
}
}

return ret;
}
};

int main()
{
Solution solu;
vector<int> nums1 = {1, 2, 2, 1};
vector<int> nums2 = {2, 2};
vector<int> ret = solu.intersection(nums1, nums2);

for (auto it=ret.begin(); it!=ret.end(); it++)
cout << *it << " ";
cout << endl;

return 0;
}

问题描述

Give a string s, count the number of non-empty (contiguous) substrings that have the same number of 0’s and 1’s, and all the 0’s and all the 1’s in these substrings are grouped consecutively.

Substrings that occur multiple times are counted the number of times they occur.

Example 1:

1
2
3
4
5
6
7
Input: "00110011"
Output: 6
Explanation: There are 6 substrings that have equal number of consecutive 1's and 0's: "0011", "01", "1100", "10", "0011", and "01".

Notice that some of these substrings repeat and are counted the number of times they occur.

Also, "00110011" is not a valid substring because all the 0's (and 1's) are not grouped together.

Example 2:

1
2
3
Input: "10101"
Output: 4
Explanation: There are 4 substrings: "10", "01", "10", "01" that have equal number of consecutive 1's and 0's.

Note:

  • s.length will be between 1 and 50,000.

  • s will only consist of “0” or “1” characters.

Related Topics: String

原问题: 696. Count Binary Substrings

中文翻译版: 696. 计数二进制子串

解决方案

如果使用暴力方法进行解决,很容易超时,因为字符串 s 最长为 50000,因此需要考虑一些取巧的方法。这里可以创建一个数组,数组元素存储每个连续子串的长度,例如,将字符串 1111000011010001011 转化为数组 [4, 4, 2, 1, 1, 3, 1, 1, 2],根据这个数组,我们可以很方便地获得满足条件的子串个数,只需要将数组中相邻元素之间最小的那个求和即可,即

1
2
3
4
n = min(4, 4) + min(4, 2) + min(2, 1) + min(1, 1)
+ min(1, 3) + min(3, 1) + min(1, 1) + min(1, 2)
= 4 + 2 + 1 + 1 + 1 + 1 + 1 + 1
= 12

实现代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <iostream>
#include <string>
#include <vector>
using namespace std;

class Solution {
public:
int countBinarySubstrings(string s)
{
vector<int> trans;
int num_bs = 0;
int count;

count = 0;
for (int i=0; i<s.size(); i++) {
count++;
if (i+1 < s.size() && s[i] != s[i+1]) {
trans.push_back(count);
count = 0;
continue;
}
}
if (count > 0)
trans.push_back(count);

for (int i=0; i<trans.size()-1; i++)
num_bs += trans[i] > trans[i+1] ? trans[i+1] : trans[i];

return num_bs;
}
};

int main()
{
Solution solu;

string str1 = "00110011";
cout << str1 << ": " << solu.countBinarySubstrings(str1) << endl;

string str2 = "10101";
cout << str2 << ": " << solu.countBinarySubstrings(str2) << endl;

return 0;
}

问题描述

Given a string, find the first non-repeating character in it and return it’s index. If it doesn’t exist, return -1.

Examples:

1
2
3
4
5
s = "leetcode"
return 0.

s = "loveleetcode",
return 2.

Note: You may assume the string contain only lowercase letters.

Related Topics: Hash Table, String

原问题: 387. First Unique Character in a String

中文翻译版: 387. 字符串中的第一个唯一字符

解决方案

本题需要找到第一个不重复出现的字符的位置,因此在遍历字符串时需要保存字符的位置。由于字符串中只有小写字母,因此可以创建一个大小为 26 的数组,存储 26 个字母在字符串中出现位置,由于还需要判断该字母是否重复出现,这时可以用一个特殊值表示该字母在字符串中重复出现。数组元素的值分三种情况:

  • -1 表示该字母已经重复出现
  • 0 表示该字母未在字符串中出现
  • 正整数表示该字母第一次在字符串中出现的位置 (位置从 1 开始)

根据输入的字符串构造出这个数组后,我们可以很快找到第一个不重复出现的字符的位置 (即数组元素中值最小的那个,不包括 -1 和 0)。实现代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <iostream>
#include <string>
#include <cstring>
using namespace std;

class Solution {
public:
int firstUniqChar(string s)
{
if (s.size() == 0)
return -1;
else if (s.size() == 1)
return 0;

/* letters[i]
* -1 means letter repeat
* 0 means letter not appear
* > 0 means position of letter in string (index starts from 1)
*/
int letters[26];
int pos;

memset(letters, 0, sizeof(letters));
for (int i=0; i<s.size(); i++) {
pos = s[i] - 'a';
if (letters[pos] > 0)
letters[pos] = -1;
else if (letters[pos] == 0)
letters[pos] = i + 1;
}

int ret = -1;
for (int i=0; i<26; i++) {
if (letters[i] > 0) {
if (ret == -1)
ret = letters[i] - 1;
else if (ret > letters[i] - 1)
ret = letters[i] - 1;
}
}

return ret;
}
};

int main()
{
Solution solu;

string str1 = "aadadaad";
cout << str1 << ": " << solu.firstUniqChar(str1) << endl;

string str2 = "loveleetcode";
cout << str2 << ": " << solu.firstUniqChar(str2) << endl;

return 0;
}

介绍

对于K均值算法中聚类个数K的选择,通常有四种方法:

  • 按需选择
  • 观察法
  • 手肘法
  • Gap Statistic方法

本文相关算法的代码以及实验在:GitHub

按需选择

简单地说就是按照建模的需求和目的来选择聚类的个数。比如说,一个游戏公司想把所有玩家做聚类分析,分成顶级、高级、中级、菜鸟四类,那么K=4;如果房地产公司想把当地的商品房分成高中低三档,那么K=3。按需选择法虽然合理,但是未必能保证在做K-Means时能够得到清晰的分界线。

观察法

观察法就是将数据可视化,然后用肉眼去观察这些数据点大概分成几堆。这个方法有个缺点,那就是数据维度不能太高,一般是二维或者三维,否则数据无法可视化出来。

手肘法

手肘法 (Elbow Method) 本质上也是一种间接的观察法。当我们对数据 ${ x_1, \ldots, x_N }$ 进行K均值聚类后,我们将得到 $K$ 个聚类的中心点 $\mu_k, k=1,2, \ldots, K$,以及数据点 $x_i$ 所对应的簇 $C_k, k=1, 2, \ldots, K$,每个簇中有 $n_k$ 个数据点。我们定义每个簇中的所有点相互之间的距离的和为
$$
D_k = \sum_{x_i \in C_k} \sum_{x_j \in C_k} \Vert x_i - x_j \Vert^2
$$
其中 $\Vert \cdot \Vert$ 为2范数。则对于聚类个数为 $K$ 时,我们可以定义一个测度量为
$$
W_K = \sum_{k=1}^K \frac{1}{2n_k} D_k
$$
对于不同的 $K$,经过 K-means 算法后我们会得到不同的中心点和数据点所属的簇,从而得到不同的度量 $W_K$。将聚类个数 $K$ 作为横坐标,$W_K$ 作为纵坐标,我们可以得到类似下面的图像:

图像中图形很像人的手肘,该方法的命名就是从这而来。从图像中我们可以观察到,$K = 1$ 到 $4$ 下降很快,$K = 4$ 之后趋于平稳,因此 $K = 4$ 是一个拐点,手肘法认为这个拐点就是最佳的聚类个数 $K$。

手肘法是一个经验方法,观察结果因人而异,特别是遇到拐点位置不是很明显时。相比于前面的观察法,该方法的优点在于其适用于高维度的数据。

Trick

如果每个簇的中心点为该簇中的所有数据点的平均值时,即
$$
\mu_k = \frac{1}{n_k} \sum_{x_i \in C_k} x_i
$$
则 $D_k$ 可以写为
$$
D_k = \sum_{x_i \in C_k} \sum_{x_j \in C_k} \Vert x_i - x_j \Vert^2 = 2n_k \sum_{x_i \in C_k} \Vert x_i - \mu_k \Vert^2
$$
$W_K$ 写为
$$
W_K = \sum_{k=1}^K \frac{1}{2n_k} D_k = \sum_{k=1}^K \sum_{x_i \in C_k} \Vert x_i - \mu_k \Vert^2
$$

Note: 以下为公式推导内容,可略过。将 $D_k$ 表达式分解可得:
$$
\begin{align}
& \sum_{x_i \in C_k} \sum_{x_j \in C_k} \Vert x_i - x_j \Vert^2 \
= & \sum_{x_i \in C_k} \sum_{x_j \in C_k} \Vert (x_i - \mu_k) - (x_j - \mu_k) \Vert^2 \
= & \sum_{x_i \in C_k} \sum_{x_j \in C_k} \left( \Vert x_i - \mu_k \Vert^2 + \Vert x_j - \mu_k \Vert^2 - 2 \left< x_i - \mu_k, x_j - \mu_k \right> \right) \
= & \sum_{x_i \in C_k} n_k \Vert x_i - \mu_k \Vert^2 + \sum_{x_j \in C_k} n_k \Vert x_j - \mu_k \Vert^2 - 2 \sum_{x_i \in C_k} \sum_{x_j \in C_k} \left< x_i - \mu_k, x_j - \mu_k \right> \
= & 2 \sum_{x_i \in C_k} n_k \Vert x_i - \mu_k \Vert^2 - 2 \sum_{x_i \in C_k} \sum_{x_j \in C_k} \left< x_i - \mu_k, x_j - \mu_k \right>
\end{align}
$$
其中
$$
\begin{align}
& \sum_{x_i \in C_k} \sum_{x_j \in C_k} \left< x_i - \mu_k, x_j - \mu_k \right> \
= & \sum_{x_i \in C_k} \left< x_i - \mu_k, \sum_{x_j \in C_k} (x_j - \mu_k) \right>
\end{align}
$$
由于
$$
\sum_{x_j \in C_k} (x_j - \mu_k) = \sum_{x_j \in C_k} x_j - n_k \mu_k = ; n_k \mu_k - n_k \mu_k = 0
$$

$$
\sum_{x_i \in C_k} \sum_{x_j \in C_k} \left< x_i - \mu_k, x_j - \mu_k \right> = 0
$$
因此可以得到
$$
\sum_{x_i \in C_k} \sum_{x_j \in C_k} \Vert x_i - x_j \Vert^2 = 2 n_k \sum_{x_i \in C_k} \Vert x_i - \mu_k \Vert^2
$$

Gap Statistic方法

这个方法来源于论文[3],详细内容可以去阅读该论文。延用前面关于度量 $W_K$ 的计算方式,该算法的流程如下:

  1. cluster the observed data, varying the total number of clusters from $K=1, 2, \ldots, K_{\text{max}}$, and giving within-dispersion measures $W_K, K=1, 2, \ldots, K_{\text{max}}$

  2. generate $B$ reference data sets and cluster each one giving within-dispersion measures $W_{Kb}$, $b=1,2, \ldots, B$, $K=1, 2, \ldots, K_{\text{max}}$. Compute the gap statistic
    $$
    \text{Gap}(K) =\frac{1}{B} \sum_{b=1}^B \log(W_{Kb}) - \log(W_K)
    $$

  3. let $\bar{w} = \frac{1}{B} \sum_{b=1}^B \log(W_{Kb})$, compute the standard deviation
    $$
    \text{sd}K = \left[ \frac{1}{B} \sum{b=1}^B (\log(W_{Kb}) - \bar{w})^2 \right]^{\frac{1}{2}}
    $$
    and define $s_K = \text{sd}_K \sqrt{1 + \frac{1}{B}}$

  4. choose the number of clusters as the smallest $K$ such that $\text{Gap(K)} \ge \text{Gap(K+1)} - s_{K+1}$

关于 reference data sets 的生成,论文给了两种生成方法,这里只用其中一个,即对于生成的样本 $x_r$,其维度为 $d$,该样本的元素 $x_{ri}$ 的值按照均匀分布 $U(a, b)$ 随机生成,其中 $a$ 为待聚类的所有数据点的第 $i$ 位置元素的最小值, $b$ 为数据点的第 $i$ 位置元素的最大值。经过实验发现,这个算法的 3,4 步骤可以去掉,只用第 2 步计算出的 $\text{Gap}(K)$,然后选择最大 $\text{Gap}(K)$ 所对应的 $K$ 作为最优的聚类个数。这里贴出从一个数据集中得到的 $\text{Gap}(K)$ 的变化曲线图

可以发现,当 $K = 4$ 时,$\text{Gap}(K)$ 取得最大值,因此选择 $K = 4$ 作为最优的聚类个数。该算法实现的代码为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
import numpy as np


def calculate_Wk(data, centroids, cluster):
K = centroids.shape[0]
wk = 0.0
for k in range(K):
data_in_cluster = data[cluster == k, :]
center = centroids[k, :]
num_points = data_in_cluster.shape[0]
for i in range(num_points):
wk = wk + np.linalg.norm(data_in_cluster[i, :]-center, ord=2) ** 2

return wk


def bounding_box(data):
dim = data.shape[1]
boxes = []
for i in range(dim):
data_min = np.amin(data[:, i])
data_max = np.amax(data[:, i])
boxes.append((data_min, data_max))

return boxes


def gap_statistic(data, max_K, B, cluster_algorithm):
num_points, dim = data.shape
K_range = np.arange(1, max_K, dtype=int)
num_K = len(K_range)
boxes = bounding_box(data)
data_generate = np.zeros((num_points, dim))

''' 写法1
log_Wks = np.zeros(num_K)
gaps = np.zeros(num_K)
sks = np.zeros(num_K)
for ind_K, K in enumerate(K_range):
cluster_centers, labels, _ = cluster_algorithm(data, K)
log_Wks[ind_K] = np.log(calculate_Wk(data, cluster_centers, labels))

# generate B reference data sets
log_Wkbs = np.zeros(B)
for b in range(B):
for i in range(num_points):
for j in range(dim):
data_generate[i][j] = \
np.random.uniform(boxes[j][0], boxes[j][1])
cluster_centers, labels, _ = cluster_algorithm(data_generate, K)
log_Wkbs[b] = \
np.log(calculate_Wk(data_generate, cluster_centers, labels))
gaps[ind_K] = np.mean(log_Wkbs) - log_Wks[ind_K]
sks[ind_K] = np.std(log_Wkbs) * np.sqrt(1 + 1.0 / B)
'''

''' 写法2
'''
log_Wks = np.zeros(num_K)
for indK, K in enumerate(K_range):
cluster_centers, labels, _ = cluster_algorithm(data, K)
log_Wks[indK] = np.log(calculate_Wk(data, cluster_centers, labels))

gaps = np.zeros(num_K)
sks = np.zeros(num_K)
log_Wkbs = np.zeros((B, num_K))

# generate B reference data sets
for b in range(B):
for i in range(num_points):
for j in range(dim):
data_generate[i, j] = \
np.random.uniform(boxes[j][0], boxes[j][1])
for indK, K in enumerate(K_range):
cluster_centers, labels, _ = cluster_algorithm(data_generate, K)
log_Wkbs[b, indK] = \
np.log(calculate_Wk(data_generate, cluster_centers, labels))

for k in range(num_K):
gaps[k] = np.mean(log_Wkbs[:, k]) - log_Wks[k]
sks[k] = np.std(log_Wkbs[:, k]) * np.sqrt(1 + 1.0 / B)

return gaps, sks, log_Wks

参考

  1. K-means怎么选K?
  2. k-means的k值该如何确定?
  3. Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2001, 63(2): 411-423.

问题描述

Given a paragraph and a list of banned words, return the most frequent word that is not in the list of banned words. It is guaranteed there is at least one word that isn’t banned, and that the answer is unique.

Words in the list of banned words are given in lowercase, and free of punctuation. Words in the paragraph are not case sensitive. The answer is in lowercase.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
Input:
paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]

Output: "ball"

Explanation:
"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph.
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"),
and that "hit" isn't the answer even though it occurs more because it is banned.

Note:

  • 1 <= paragraph.length <= 1000.
  • 1 <= banned.length <= 100.
  • 1 <= banned[i].length <= 10.
  • The answer is unique, and written in lowercase (even if its occurrences in paragraph may have uppercase symbols, and even if it is a proper noun.)
  • paragraph only consists of letters, spaces, or the punctuation symbols !?',;.
  • Different words in paragraph are always separated by a space.
  • There are no hyphens or hyphenated words.
  • Words only consist of letters, never apostrophes or other punctuation symbols.

Related Topics: String

原问题: 819. Most Common Word

中文翻译版: 819. Most Common Word

解决方案

题目要求在字符串 paragraph 中找到出现次数最多的单词,并且单词不能是列表 banned 中的单词。解决该题可以设定两个哈希表,第一个哈希表的 key 为列表 banned 中的单词,用于判断单词是否在列表中,第二个哈希表的 key 为字符串 paragraph 中的单词,value 为该单词的出现次数。依次取出字符串 paragraph 中的单词,由于题目要求单词都是小写,所以要将单词转换成小写,然后在第一个哈希表中查找该单词是否存在,如果不存在,将第二个哈希表中该单词的出现次数加1,否则取字符串下一个单词,依次重复这个操作直到遍历完字符串 paragraph 的所有单词,接着根据第二个哈希表找出出现次数最多的单词。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <unordered_set>
#include <unordered_map>
using namespace std;

class Solution {
public:
bool delimter(char ch)
{
if (ch == ' ' || ch == '!' ||
ch == '?' || ch == '\'' ||
ch == ',' || ch == ';' || ch == '.')
return true;
else
return false;
}

void lowercase(string& str)
{
transform(str.begin(), str.end(), str.begin(), ::tolower);
}

string mostCommonWord(string paragraph, vector<string>& banned)
{
unordered_set<string> banned_list;
for (int i=0; i<banned.size(); i++)
banned_list.insert(banned[i]);

int times, max_times;
string str, common_str;
int start, end;
unordered_map<string, int> word_counts;

start = 0;
max_times = 0;
common_str = "";
for (int i=0; i<paragraph.size(); i++) {
if (delimter(paragraph[i])) {
end = i;

str = string(paragraph, start, end-start);
lowercase(str);

if (banned_list.find(str) == banned_list.end()) {
auto it = word_counts.find(str);
if (it == word_counts.end()) {
word_counts.insert(make_pair(str, 1));
times = 1;
} else {
it->second = it->second + 1;
times = it->second;
}
if (times > max_times) {
common_str = str;
max_times = times;
}
}

while (i<paragraph.size() && delimter(paragraph[i]))
i++;

start = i;
}
}

// the last word
end = paragraph.size();
if (start < end) {
str = string(paragraph, start, end-start);
lowercase(str);
if (banned_list.find(str) == banned_list.end()) {
auto it = word_counts.find(str);
if (it == word_counts.end()) {
word_counts.insert(make_pair(str, 1));
times = 1;
} else {
it->second = it->second + 1;
times = it->second;
}
if (times > max_times) {
common_str = str;
max_times = times;
}
}
}

return common_str;
}
};

int main()
{
Solution solu;
string paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.";
vector<string> banned = {"hit"};

cout << solu.mostCommonWord(paragraph, banned) << endl;

return 0;
}

问题描述

Initially, there is a Robot at position (0, 0). Given a sequence of its moves, judge if this robot makes a circle, which means it moves back to the original place.

The move sequence is represented by a string. And each move is represent by a character. The valid robot moves are R (Right), L (Left), U (Up) and D (down). The output should be true or false representing whether the robot makes a circle.

Example 1:

1
2
Input: "UD"
Output: true

Example 2:

1
2
Input: "LL"
Output: false

Related Topics: String

原问题: 657. Judge Route Circle

中文翻译版: 657. 判断路线成圈

解决方案

根据题目描述,可以设定一个二维坐标 (x, y),根据指令做相应的改变:

  • Right: x = x + 1
  • Left: x = x - 1
  • Up: y = y + 1
  • Down: y = y - 1

最后判断二维坐标 (x, y) 是否为原点 (0, 0) 即可。实现代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <iostream>
using namespace std;

class Solution {
public:
bool judgeCircle(string moves)
{
int x = 0, y = 0;

for (int i=0; i<moves.size(); i++) {
switch (moves[i]) {
case 'U':
y++;
break;
case 'D':
y--;
break;
case 'L':
x--;
break;
case 'R':
x++;
break;
}
}

return (x == 0 && y == 0);
}
};

int main()
{
Solution solu;
string str1 = "UD";
cout << str1 << ": " << solu.judgeCircle(str1) << endl;
string str2 = "LL";
cout << str2 << ": " << solu.judgeCircle(str2) << endl;

return 0;
}

问题描述

Roman numerals are represented by seven different symbols: I, V, X, L, C, D and M.

1
2
3
4
5
6
7
8
Symbol       Value
I 1
V 5
X 10
L 50
C 100
D 500
M 1000

For example, two is written as II in Roman numeral, just two one’s added together. Twelve is written as, XII, which is simply X + II. The number twenty seven is written as XXVII, which is XX + V + II.

Roman numerals are usually written largest to smallest from left to right. However, the numeral for four is not IIII. Instead, the number four is written as IV. Because the one is before the five we subtract it making four. The same principle applies to the number nine, which is written as IX. There are six instances where subtraction is used:

  • I can be placed before V (5) and X (10) to make 4 and 9.
  • X can be placed before L (50) and C (100) to make 40 and 90.
  • C can be placed before D (500) and M (1000) to make 400 and 900.

Given a roman numeral, convert it to an integer. Input is guaranteed to be within the range from 1 to 3999.

Example 1:

1
2
Input: "III"
Output: 3

Example 2:

1
2
Input: "IV"
Output: 4

Example 3:

1
2
Input: "IX"
Output: 9

Example 4:

1
2
3
Input: "LVIII"
Output: 58
Explanation: C = 100, L = 50, XXX = 30 and III = 3.

Example 5:

1
2
3
Input: "MCMXCIV"
Output: 1994
Explanation: M = 1000, CM = 900, XC = 90 and IV = 4.

Related Topics: Math, String

原问题: 13. Roman to Integer

中文翻译版: 13. 罗马数字转整数

解决方案

题目难度不大,根据题目描述可以写出代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#include <iostream>
#include <string>
using namespace std;

class Solution {
public:
int romanToInt(string s)
{
int num = 0;
int j;
char ch;

for (int i=0; i<s.size(); i++) {
ch = s[i];
j = i + 1;
switch (ch) {
case 'I': // value 1
if (j < s.size() && (s[j] == 'V' || s[j] == 'X'))
num -= 1;
else
num += 1;
break;
case 'V': // value 5
num += 5;
break;
case 'X': // value 10
if (j < s.size() && (s[j] == 'L' || s[j] == 'C'))
num -= 10;
else
num += 10;
break;
case 'L': // value 50
num += 50;
break;
case 'C': // value 100
if (j < s.size() && (s[j] == 'D' || s[j] == 'M'))
num -= 100;
else
num += 100;
break;
case 'D': // value 500
num += 500;
break;
case 'M': // value 1000
num += 1000;
break;
}
}

return num;
}
};

int main()
{
Solution solu;
string str1 = "MCMXCIV";

cout << str1 << ": " << solu.romanToInt(str1) << endl;

return 0;
}

问题描述

Given an arbitrary ransom note string and another string containing letters from all the magazines, write a function that will return true if the ransom note can be constructed from the magazines ; otherwise, it will return false.

Each letter in the magazine string can only be used once in your ransom note.

Note:

You may assume that both strings contain only lowercase letters.

1
2
3
canConstruct("a", "b") -> false
canConstruct("aa", "ab") -> false
canConstruct("aa", "aab") -> true

Related Topics: String

原问题:383. Ransom Note

中文翻译版:383. 赎金信

解决方案

这道题要求字符串 ransom note 中的字符都能从字符串 magazine 中找到,而且 ransom note 中的字符的重复次数要小于等于 magazine 相对应字符的重复次数。基于上面的要求,有个很简单的解决方案,对字符串 ransom note 和字符串 magazine 分别构造一个大小为 26 的数组,存储 26 个字母在字符串中出现的次数,然后遍历数组,判断 ransom note 中的字符的重复次数是否小于等于 magazine 相对应字符的重复次数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
#include <cstring>
#include <string>
using namespace std;

class Solution {
public:
bool canConstruct(string ransomNote, string magazine)
{
if (ransomNote.size() > magazine.size())
return false;

int rmap[26], mmap[26];

memset(rmap, 0, sizeof(rmap));
memset(mmap, 0, sizeof(mmap));

int i = 0, j = 0;
while (i < ransomNote.size() || j < magazine.size()) {
if (i < ransomNote.size())
rmap[ransomNote[i++] - 'a'] += 1;
if (j < magazine.size())
mmap[magazine[j++] - 'a'] += 1;
}

for (i=0; i<26; i++)
if (rmap[i] > mmap[i])
return false;

return true;
}
};

int main()
{
Solution solu;

string r1 = "ab";
string m1 = "aba";
cout << r1 << " " << m1 << ": " << solu.canConstruct(r1, m1);

return 0;
}

问题描述

Given a group of two strings, you need to find the longest uncommon subsequence of this group of two strings. The longest uncommon subsequence is defined as the longest subsequence of one of these strings and this subsequence should not be any subsequence of the other strings.

A subsequence is a sequence that can be derived from one sequence by deleting some characters without changing the order of the remaining elements. Trivially, any string is a subsequence of itself and an empty string is a subsequence of any string.

The input will be two strings, and the output needs to be the length of the longest uncommon subsequence. If the longest uncommon subsequence doesn’t exist, return -1.

Example 1:

1
2
3
4
5
Input: "aba", "cdc"
Output: 3
Explanation: The longest uncommon subsequence is "aba" (or "cdc"),
because "aba" is a subsequence of "aba",
but not a subsequence of any other strings in the group of two strings.

Note:

  1. Both strings’ lengths will not exceed 100.
  2. Only letters from a ~ z will appear in input strings.

Related Topics: String

原问题:521. Longest Uncommon Subsequence I

中文翻译版:521. 最长特殊序列 Ⅰ

解决方案

这道题粗看以为是一道很难的题目,但是解决该问题只需要考虑三种情况:

  1. $a = b$
    If both the strings are identical, it is obvious that no subsequence will be uncommon. Hence, return -1.
  2. $length(a) = length(b)$ and $a \ne b$
    Example: $abc$ and $abd$. In this case we can consider any string i.e. $abc$ or $abd$ as a required subsequence, as out of these two strings one string will never be a subsequence of other string. Hence, return $length(a)$ or $length(b)$.
  3. $length(a) \ne length(b)$
    Example: $abcd$ and $abc$. In this case we can consider bigger string as a required subsequence because bigger string can’t be a subsequence of smaller string. Hence, return $\max{(length(a), length(b))}$.

贴上实现代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <iostream>
using namespace std;

class Solution {
public:
int findLUSlength(string a, string b)
{
int i = 0, j = 0;

if (a.size() == b.size()) {
while (i < a.size() && j < a.size() && a[i] == b[j]) {
i++;
j++;
}
if (i < a.size() || j < b.size())
return a.size();
else
return -1;
} else {
return a.size() > b.size() ? a.size() : b.size();
}
}
};

int main()
{
Solution solu;
string str1 = "abcd", str2 = "abc";

cout << solu.findLUSlength(str1, str2) << endl;

return 0;
}