指针数组 array 字符串 string pointer

编程之道

发布日期: 2022-10-07

文章字数: 2.6k

阅读时长: 11 分

阅读次数:

💡 本篇的内容来自C++ Primer中的3.5节

Define & Initialize

声明

数组像指针和引用一样是复杂类型，数组的形式a[d],

其中 a为name；
d为dimension，必须大于0。

The number of elements in an array is part of the array’s type.

因此，dimension必须在编译时已知，因此它必须是一个constant。
```
unsigned cnt = 42; // not a constant expression 
constexpr unsigned sz = 42; // constant expression
```
📌 但是，这种问题在g++和clang++编译时可能不会报错，这是因为这些编译器提供了扩展，将这种写法视为正确，但是建议禁止这些特性，因为会导致无法在别的compiler中成功运行，在g++和clang++对应的选项是-pedantic-errors。

初始化

隐式

数组的初始化与数据元素的类型（内置类型、复合类型）、变量作用域（全局变量、局部变量）有关。

string类型数组全部初始化为空串，无论全局还是局部；
int类型在全局则全部初始化为0，在函数内部则undefined，如果尝试拷贝或者输出这些变量，会有奇怪的事情发生；

因此，千万要显示初始化，不要留有悬疑空间。

显式

将初始化元素全部列出，此时可以忽略dimension；
- compiler会自动推导；
```
int a2[] = {0, 1, 2};
```

如果给定dimension，初始化元素不能超过它；

int a5[2] = {0,1,2}; // error: too many initializers

如果初始化元素数量小于dimension，使用默认初始化；

int a3[5] = {0, 1, 2}; // equivalent to a3[] = {0, 1, 2, 0, 0} 
string a4[3] = {"hi", "bye"}; // same as a4[] = {"hi", "bye", ""}

Access

通过对数组遍历可以访问其中的元素，在C++中遍历数组，除了经典的for循环和while循环外，在C++11中还可以使用类似Java中for…each循环。

int a[5] = {1, 2, 3, 4, 5};
for(auto ai: a){
    cout << ai << endl;
}

在数组的访问中，当出现数组越界等问题时，会出现buffer overflow等bug，而且这些bug在编译时往往难以检查出来，只能在运行时出现异常。

Array vs Vector

	array	vector
存放数据类型	必须是相同类型	必须是相同类型
访问存放元素	无名字，必须通过位置访问	无名字，必须通过位置访问
容量	固定，不能扩展，性能较好	不固定，能扩展，但是损害性能
数组维度获取	无size函数：(1)对于字符数组，可以用`strlen`；(2)其他数组，只能用`sizeof(array)/sizeof(array[0])`计算长度；(3)`end(array) - begin(array)`(c++11)	有size函数
下标类型	可以为负值，涉及指针的算数运算，即便为负值，也需要指向原始数组中的元素。	必须为非负值

两者之间的转换：

int int_arr[] = {0, 1, 2, 3, 4, 5}; 
// ivec has six elements; each is a copy of the corresponding element in int_arr 
vector<int> ivec(begin(int_arr), end(int_arr));
// subset
vector<int> subVec(int_arr + 1, int_arr + 4);

Array & Pointer

arrays hold objects, 因此可以存储pointers，但是references不是objects, 因此不能存储references。

基本使用

数组中的元素均为objects（因此有地址，对比reference不是objects, 因此没有地址）, 因此可以将这些元素的地址赋值给指针；
```
string nums[] = {"one", "two", "three"}; 
string *p = &nums[0]; // p points to the ﬁrst element in nums
```

存储指针的数组

int *parr[sz]; // array of 42 pointers to int
// -----
// parr[sz] stores pointers to int.

这里容易混乱.

By default, type modiﬁors bind right to left.

特殊性质

特殊性质：多数情况下，编译器将数组名称视为第一个元素的指针；
```
string *p2 = nums; //equivalent to p2 = &nums[0]
```

数组的操作经常可以认为是指针的操作；

when we use an array as an initializer for a variable deﬁned using auto, the deduced type is a pointer, not an array.

int ia[] = {0,1,2,3,4,5,6,7,8,9}; // ia is an array of ten ints 
auto ia2(ia); // ia2 is an int * that points to the ﬁrst element in ia 
ia2 = 42; // error: ia2 is a pointer, and we can’t assign an int to a pointer

when we use decltype. The type returned by decltype(ia) is array of ten ints.

// ia3 is an array of ten ints 
decltype(ia) ia3 = {0,1,2,3,4,5,6,7,8,9}; 
ia3 = p; // error: can’t assign an int * to an array 
ia3[4] = i; // ok: assigns the value of i to an element in ia3

Pointers are Iterators

off-the-end pointer

前面提到数组的名字可以视为指向其中第一个元素的指针，因此可以通过对名称进行加减操作，来移动指向地址的位置，如下：

int arr[] = {0,1,2,3,4,5,6,7,8,9};
int *p = arr; // p points to the first element in arr 
++p; // p points to arr[1]

基于这种特性，可以使用point对array中的元素进行遍历。

int *e = &arr[10]; //pointer just past the last element in arr

for (int * b = arr; b != e; ++b) {
    cout << * b << endl; // print the elements in arr
}

其中，&arr[10]是4种有效的指针数据之一，在指针知识部分有说明, 这是一种off-the-end pointer, 非常易于出错。

begin & end

在c++11种提出，begin和end函数：

begin returns a pointer to the ﬁrst;
end returns a pointer one past the last element in the given array:
These functions are deﬁned in the iterator header.

使用如下：

int arr[] = {0,1,2,3,4,5,6,7,8,9};
int *pbeg = begin(arr);
int *pend = end(arr);

while(pbeg != pend) {
    cout << *pbeg << endl;
    pbeg++;
}

这种方法，本质上与off-the-end pointer的做法相同，end函数得到的还是末端之外的指针，但是不用显式操作指针计算，更加安全。

pointer算数运算

Pointers that address array elements can use all the iterator operations listed in the following tables.

pointer算数运算1

pointer算数运算2

几个典型的使用如下：

指针偏移形成新的指针；

int arr[5];
int *ip = arr; // equivalent to int * ip = &arr[0] 
int *ip2 = ip + 4; // ip2 points to arr[4]

// 注意：这两种写法，意义完全不同
int last = *(arr + 4); // == arr[4]
last = *arr + 4; // == arr[0] + 4

此时要保证，新的指针ip2必须指向ip所指向的数组，否则出错。

int *p3 = arr + 5; // 正确，但是使用*p3取数据出错，因为是off-the-end pointer
int *p4 = arr + 10; // 错误，超出范围

指针之间相减
```
auto n = end(arr) - begin(arr); //n is 5, the number of elements in arr
```
- The result of subtracting two pointers is a library type named ptrdiff_t.
- Like size_t, the ptrdiff_t type is a machine-speciﬁc type and is deﬁned in the cstddef header.
- ptrdiff_t is a signed integral type.
相减的结果，可以为负数，因此，指针不同于vector，其下标可以为负值，但是必须指向原始数组中的元素。

指针相比较

int * b = arr, *e = arr + sz; 
while (b < e) {
    // use *b
    ++b; 
}

此时，进行的比较的指针必须均对应相同的数组（或者off-the-end元素），否则没有意义。

int *p = nullptr;
cout << p << endl;
cout << ++p << endl;
int *a = nullptr;
cout << p - a << endl;

int i = 10;
int *ia = &i;
cout << i << endl;
cout << *ia << endl;
cout << ia << endl;
// --- output ---
0x0
0x4
1
10
10
0x7ffee03e32fc

Array & String

C-Style Character Strings

因为C++继承了C语言，因此C语言中关于字符串的使用方式也被继承下来。

初始化

C语言中的字符串，是使用char[]表示的，一个重要的地方是，这些字符数组必须均以null charactor \0结尾。

C-style strings are not a type.
- they are a convention for how to represent and use character strings.
- Strings that follow this convention are stored in character arrays and are null terminated.
- By null-terminated we mean that the last character in the string is followed by a null character (’\0’).
Ordinarily we use pointers to manipulate these strings.

在C语言，操作字符串，更多是使用pointer完成，这也是此处将pointer和string放到一起的原因之一。

这种使用方式中，字符串的初始化可以有以下几种：

char a1[] = {'C', '+', '+'};  // 声明没有问题，但是未使用\0结尾，只能当做字符数组使用，当做字符串使用或相关函数中会出现问题
char a2[] = {'C', '+', '+', '\0'}; // 正确方式
char a3[] = "C++";   // 正确方式，结尾自动添加了\0
const char a4[6] = "Daniel"; // 编译时错误，该字符串包括\0占据7个，空间不够
// 查看长度
cout << end(a1) - begin(a1) << endl;  // 输出3
cout << end(a2) - begin(a2) << endl;  // 输出4
cout << end(a3) - begin(a3) << endl;  // 输出4
// 输出字符串
cout << a1 << endl;
cout << a2 << endl;
cout << a3 << endl;
// --- output ---
// C++PSj��: 因为a1没有使用\0结尾。
// C++: 正确输出，找到第一个\0作为结尾。
// C++: 正确输出

字符串函数

C语言中，有很多字符串函数，如下：

C语言中的字符串函数

当这些函数用在char[]中时，也会受到\0的影响, 接上例。

cout << strlen(a1) << endl;
cout << strlen(a2) << endl;
cout << strlen(a3) << endl;
// --- output ---
9
3
3

第一个为什么输出9？因为字符串函数strlen()会寻找第一个\0, 然后计算长度，但是a1中没有\0, 因此结果undefined, 也可能是别的。

比较字符串

C语言中字符串的比较与C++中strings中不同, 字符串的比较有两个角度，比较数组地址，和比较数组内容。

// 比较不同数组的地址，无意义
const char ca1[] = "A string example"; const char ca2[] = "A different string";
// undeﬁned: compares two unrelated addresses
if (ca1 < ca2){
    // ...
}
// 比较数组的数据，使用strcmp
// same effect as string comparison s1 < s2
if (strcmp(ca1, ca2) < 0){
    // ...
}

C语言风格下，只能使用strcmp来比较字符串，结果为0，字符串相同。

在C++中，字符串可以简单的使用 ca1 < cal2 完成。

字符串拼接和复制

在C语言风格中，也不能使用如下风格的写法：

string largeStr = s1 + " " + s2;

而是通过strcpy和strcat完成字符串的拼接和复制。

strcpy(largeStr, ca1); // copies ca1 into largeStr 
strcat(largeStr, " "); // adds a space at the end of largeStr 
strcat(largeStr, ca2); // concatenates ca2 onto largeStr

这种使用，必须事先计算好largeStr是否能够容纳最终的结果，否则就会出错。

总结

alex Li

https://limeya.github.io/2022/10/07/bian-cheng-zhi-dao/c-shu-zu-de-shi-yong-yi-ji-xiang-guan-nei-rong/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 alex Li !

指针数组 array 字符串 string pointer

C++:const关键字

关于C++中const关键字学习总结。

2022-10-07 编程之道

常量 const constexpr

《邻家的百万富翁》如何生活？

本书通过调查研究，剖析了百万富翁们的特点，和致富原因。

2022-09-25 读书感悟

致富投资计划