sqlite3 统计一千万条信息表数据重复次数

Cool_Breeze · 发表于 2021-4-23 19:22

本帖最后由 Cool_Breeze 于 2021-4-23 20:01 编辑

数据库表视图
number name local yuwen shuxue english
1       N4358       三年二班       61       2       65
2       D7005       三年一班       60       35       7
3       F9246       三年五班       36       26       6
4       N8857       三年三班       79       35       40
5       F4186       三年四班       50       95       71
6       J8401       三年二班       6       32       6
7       A7756       三年二班       6       57       6
8       D2809       三年二班       10       45       4
9       C0035       三年三班       25       37       55
10       I5499       三年三班       40       82       94

统计结果：
name,次数
A0000,43
A0001,39
A0002,41
A0003,35
A0004,28
A0005,31
A0006,34
A0007,36
A0008,28
A0009,46
统计每个 name 出现的次数
代码：

[Python] 纯文本查看 复制代码

begin = time.monotonic()
cur.execute("select name, count(name) from student group  by name")
# with open("res.csv", "w", newline="") as f:
    # writer = csv.writer(f)
    # for n in cur.fetchall():
        # writer.writerow(n)
print(f"耗时：{time.monotonic() - begin} 秒", )

耗时：6.037000000011176 秒

建索引后耗时： 0.0秒但是数据库由原来的 324MB 扩大到 459MB
建索引代码：

[Python] 纯文本查看 复制代码

cur.execute("create index name_index on student (name)")

刚学习数据库，这个速度算快吗（Intel(R) Celeron(R) CPU G1840 @ 2.80GHz）！或者还有更快的统计方法？

测试代码：

[Python] 纯文本查看 复制代码

import sqlite3
import datetime
import random
import csv
import time

con = sqlite3.connect("big_table.db")
cur = con.cursor()

# 建表
# cur.execute("""create table student(
    # number integer primary key autoincrement,
    # name char(5) not null,
    # local char(5),
    # yuwen float,
    # shuxue float,
    # english float)""")

#324

x = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
x_n = "0123456789"
local = ["三年一班","三年二班","三年三班","三年四班","三年五班"]

def name():
    temp = ""
    temp += x[random.randint(0,25)]
    for i in range(4):
        temp += x_n[random.randint(0,9)]
    return temp

# 插入数据    
# for n in range(10000000):
    # cur.execute("insert into student(name, local, yuwen, shuxue, english) values(?,?,?,?,?)",
        # (name(),local[random.randint(0,4)],random.randint(0,100),random.randint(0,100),random.randint(0,100)))

# 建索引
# cur.execute("create index name_index on student (name)")

begin = time.monotonic()
cur.execute("select name, count(name) from student group  by name")
# for n in cur.fetchall():
    # print(n)
# with open("res.csv", "w", newline="") as f:
    # writer = csv.writer(f)
    # for n in cur.fetchall():
        # writer.writerow(n)
print(f"耗时：{time.monotonic() - begin} 秒", )
# 提交
con.commit()
con.close()

Clarksh · 发表于 2021-4-23 19:38

已经很快了, 有索引吧.

hate · 发表于 2021-4-23 19:39

建索引了吗

Cool_Breeze · 发表于 2021-4-23 19:54

hate 发表于 2021-4-23 19:39
建索引了吗

卧槽，建了索引耗时：0.0 秒
太快了吧！
但是数据库由原来的 324MB 扩大到 459MB

Cool_Breeze · 发表于 2021-4-23 19:55

Clarksh 发表于 2021-4-23 19:38
已经很快了, 有索引吧.

没有建索引，建了索引只需要 0.0 秒太快了。牛逼啊！

RoyPenn · 发表于 2021-4-23 19:56

这个速度可以了，

Cool_Breeze · 发表于 2021-4-23 20:03

RoyPenn 发表于 2021-4-23 19:56
这个速度可以了，

和其它数据库差不多吗？没有学过其它数据库！

RoyPenn · 发表于 2021-4-23 20:04

Cool_Breeze 发表于 2021-4-23 20:03
和其它数据库差不多吗？没有学过其它数据库！

这个跟数据结构也有关系，不太好比较，相对而言，这个速度很快了

Cool_Breeze · 发表于 2021-4-23 20:05

RoyPenn 发表于 2021-4-23 20:04
这个跟数据结构也有关系，不太好比较，相对而言，这个速度很快了

好的。感谢解答！

richens · 发表于 2021-4-23 21:17

学习了，谢谢！

帐号		自动登录	找回密码
密码			注册[Register]

[讨论] sqlite3 统计一千万条信息表数据重复次数

个人中心

[讨论] sqlite3 统计一千万条信息表 数据重复次数

个人中心

[讨论] sqlite3 统计一千万条信息表数据重复次数