diff --git a/README.md b/README.md deleted file mode 100644 index 8b03608..0000000 --- a/README.md +++ /dev/null @@ -1,156 +0,0 @@ - - -# 模拟考试(实操题) - -**考试时间**:90 分钟(2 课时) -**考试形式**:开卷 -**总分**:70 分 - ---- - -## ⚠️ 重要说明 - -- 本题所有代码、标注结果、图片都要上传到 **本仓库** 下 -- **文件夹必须按下面要求** 的结构创建,否则不得分 -- 数据爬取代码 **必须** 在一次爬取中获取到所有需要数据(数据刷新会变) - ---- - -## 第Ⅱ部分 实操题 - -### 一、数据爬取(25 分) - -**本题中涉及的爬虫代码必须包含检测头,同时由于网页刷新后数据会变化,必须在一次爬取中获取到所有需要数据,否则不得分。** - -#### 第 1 题(10 分) - -用 python 代码访问网址 `https://exam.detr.top/exam-b/movies`,抓取网页的数据编号和其中全部 10 部电影的信息,并分别保存为两个文件: - -- `movies.json`(存放数据编号和电影信息,电影中包含的键为 `id, title, director, year, rating, duration, genre, actors_count`) -- `movies.html`(保存原始网页源码) - -#### 第 2 题(15 分) - -根据 `movies.json` 中的数据: - -1. ① 找出评分最高和最低的电影,打印电影名 + 评分。 -2. ② 统计各类型的电影数量,用字典格式输出。 -3. ③ 统计各导演的电影数量,用字典格式输出。 -4. ④ 统计 2020 年(含)以后上映的电影数量。 - ---- - -### 二、数据标注(20 分) - -**本章中所有标注结果都需要导出文件,并上传到本仓库的 `q3` 文件夹下。每个小题单独放一个子文件夹。未按要求上传文件不得分。** - -#### 第 1 小题(8 分)— 图像目标检测标注 - -在 Label Studio 中打开图片 `data/images/标注练习1.jpg`(图片中含 1 只猫、1 只狗、1 辆车,背景为街道),使用 Rectangle Labels 工具标出 3 个目标物。要求: - -1. (1)边界框必须紧贴目标物轮廓 -2. (2)标签必须为 `cat`、`dog`、`car`(**必须小写英文**,不能写成"猫/狗/车"或"Cat/Dog/Car") -3. (3)导出为 YOLO 格式压缩包 -4. (4)将压缩包命名为 `q3_1_image_labels.zip` 并上传到 `q3/q3_1/` 文件夹 -5. (5)压缩包解压后必须包含 `classes.txt` 和 `labels/` 目录 - -#### 第 2 小题(7 分)— 文本情感分类标注 - -现有 5 条外卖评论文本 `data/reviews.json`,用 Label Studio 完成情感分类标注(标签:正面/负面),导出为 JSON 格式。要求: - -1. (1)标注必须包含每条评论的 `id` 和 `text` 字段 -2. (2)每个标注必须有一个 `sentiment` 字段,值为 `"正面"` 或 `"负面"` -3. (3)5 条评论必须全部标注 -4. (4)将导出文件命名为 `q3_2_takeout_reviews.json` 并上传到 `q3/q3_2/` 文件夹 - -#### 第 3 小题(5 分)— 标注质量自评 - -在 `q3` 文件夹下新建 `q3_3_质量自评.md` 文件,写一份 200 字左右的标注质量自评报告,内容包括: - -1. (1)标注前准备 —— 你制定了什么标注规范?看了几张示例图片?(2 分) -2. (2)标注过程 —— 遇到什么困难?如何解决?中途不确定的标注如何处理?(2 分) -3. (3)标注后检查 —— 做了哪些检查?是否导入了多份相同数据互相对比?(1 分) - ---- - -### 三、数据可视化(25 分) - -**本章所有图表都要用 matplotlib 绘制,PNG 文件必须用 `plt.savefig` 保存。所有 Python 代码和 PNG 文件都需要上传到本仓库的 `q4` 文件夹下,每个小题单独一个子文件夹。** - -#### 第 1 小题(8 分)— 柱状图 - -从 `movies.json`(本卷第 II 部分保存的 `movies.json`)读取数据,用 matplotlib 绘制**各类型的电影数量柱状图**。要求: - -1. (1)使用 `plt.bar` 函数绘制柱状图 -2. (2)X 轴为类型名称 -3. (3)Y 轴为电影数量 -4. (4)标题设置为"类型电影数量分布"(用 `plt.title`) -5. (5)保存为 `q4_1_bar.png`(用 `plt.savefig`,`dpi=150`) -6. (6)Python 代码保存为 `q4_1.py` - -#### 第 2 小题(7 分)— 散点图 - -从 `movies.json` 读取数据,用 matplotlib 绘制**评分 vs 时长散点图**。要求: - -1. (1)使用 `plt.scatter` 函数绘制散点图 -2. (2)X 轴为时长(分钟),Y 轴为评分 -3. (3)标题设置为"时长与评分关系散点图"(用 `plt.title`) -4. (4)使用 `plt.xlabel` 和 `plt.ylabel` 设置轴标签 -5. (5)点的颜色设为红色,`alpha=0.6`(半透明) -6. (6)保存为 `q4_2_scatter.png`(用 `plt.savefig`,`dpi=150`) -7. (7)Python 代码保存为 `q4_2.py` - -#### 第 3 小题(10 分)— 直方图 - -从 `movies.json` 读取数据,绘制**两张独立的直方图**: - -- **(A)评分直方图(5 分)**:使用 `plt.hist` 函数绘制 10 部电影评分字段的分布直方图,`bins=5`,颜色蓝色,标题"评分分布"(用 `plt.title`),X 轴标签"评分"(用 `plt.xlabel`),保存为 `q4_3a_hist.png`(`dpi=150`),代码保存为 `q4_3a.py` -- **(B)时长直方图(5 分)**:使用 `plt.hist` 函数绘制 10 部电影时长字段的分布直方图,`bins=5`,颜色绿色,标题"时长分布",X 轴标签"时长(分钟)",保存为 `q4_3b_hist.png`(`dpi=150`),代码保存为 `q4_3b.py` - ---- - -## 📁 仓库文件夹结构(必须按这个提交) - -``` -simulated-examination/ -├── data/ # 模拟考数据(已上传) -│ ├── images/ -│ │ └── 标注练习1.jpg -│ └── reviews.json -├── q2_1_crawler/ -│ ├── q2_1.py # 数据爬取第 1 题代码 -│ ├── q2_2.py # 数据爬取第 2 题代码 -│ ├── movies.json # 爬取结果(由 q2_1.py 生成) -│ └── movies.html # 原始网页(由 q2_1.py 生成) -├── q3/ -│ ├── q3_1/ -│ │ └── q3_1_image_labels.zip # 图像标注 YOLO 导出 -│ ├── q3_2/ -│ │ └── q3_2_takeout_reviews.json # 文本标注 JSON 导出 -│ └── q3_3_质量自评.md -├── q4/ -│ ├── q4_1/ -│ │ ├── q4_1.py # 柱状图代码 -│ │ └── q4_1_bar.png # 柱状图结果 -│ ├── q4_2/ -│ │ ├── q4_2.py # 散点图代码 -│ │ └── q4_2_scatter.png # 散点图结果 -│ ├── q4_3a/ -│ │ ├── q4_3a.py # 评分直方图代码 -│ │ └── q4_3a_hist.png # 评分直方图结果 -│ ├── q4_3b/ -│ │ ├── q4_3b.py # 时长直方图代码 -│ │ └── q4_3b_hist.png # 时长直方图结果 -``` - ---- - -## 提交方式 - -1. 在本仓库下**按上面的文件夹结构**创建目录 -2. 依次完成 3 大题的所有小题 -3. 每完成一题就 `git add` + `git commit` + `git push` 到本仓库 -4. 考试结束时**最后一次 commit 时间**视为交卷时间 diff --git a/data/.gitkeep b/data/.gitkeep deleted file mode 100644 index b22e7b9..0000000 --- a/data/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# 此目录用于存放学生提交的文件,请勿删除 \ No newline at end of file diff --git a/data/images/.gitkeep b/data/images/.gitkeep deleted file mode 100644 index b22e7b9..0000000 --- a/data/images/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# 此目录用于存放学生提交的文件,请勿删除 \ No newline at end of file diff --git a/q2_1_crawler/.gitkeep b/q2_1_crawler/.gitkeep deleted file mode 100644 index b22e7b9..0000000 --- a/q2_1_crawler/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# 此目录用于存放学生提交的文件,请勿删除 \ No newline at end of file diff --git a/q2_1_crawler/move.html b/q2_1_crawler/move.html new file mode 100644 index 0000000..667e109 --- /dev/null +++ b/q2_1_crawler/move.html @@ -0,0 +1,102 @@ +[ + { + "id": "1", + "title": "泰坦尼克号", + "director": "Frank Darabont", + "year": "2015", + "rating": "6.8", + "duration": "91", + "genre": "科幻", + "actors_count": "3" + }, + { + "id": "2", + "title": "星际穿越", + "director": "陈凯歌", + "year": "2021", + "rating": "6.2", + "duration": "113", + "genre": "科幻", + "actors_count": "2" + }, + { + "id": "3", + "title": "三傻大闹宝莱坞", + "director": "Robert Zemeckis", + "year": "2004", + "rating": "7.4", + "duration": "95", + "genre": "悬疑", + "actors_count": "4" + }, + { + "id": "4", + "title": "阿甘正传", + "director": "James Cameron", + "year": "2013", + "rating": "6.9", + "duration": "93", + "genre": "爱情", + "actors_count": "4" + }, + { + "id": "5", + "title": "放牛班的春天", + "director": "宫崎骏", + "year": "2005", + "rating": "7.1", + "duration": "127", + "genre": "悬疑", + "actors_count": "3" + }, + { + "id": "6", + "title": "千与千寻", + "director": "Christopher Nolan", + "year": "2024", + "rating": "6.4", + "duration": "147", + "genre": "动画", + "actors_count": "3" + }, + { + "id": "7", + "title": "忠犬八公的故事", + "director": "Lasse Hallström", + "year": "2002", + "rating": "6.2", + "duration": "166", + "genre": "剧情", + "actors_count": "4" + }, + { + "id": "8", + "title": "霸王别姬", + "director": "Rajkumar Hirani", + "year": "2005", + "rating": "7.9", + "duration": "149", + "genre": "冒险", + "actors_count": "2" + }, + { + "id": "9", + "title": "肖申克的救赎", + "director": "Christophe Barratier", + "year": "2008", + "rating": "9.3", + "duration": "91", + "genre": "冒险", + "actors_count": "2" + }, + { + "id": "10", + "title": "盗梦空间", + "director": "Christopher Nolan", + "year": "2019", + "rating": "7.1", + "duration": "132", + "genre": "剧情", + "actors_count": "5" + } +] \ No newline at end of file diff --git a/q2_1_crawler/movie.json b/q2_1_crawler/movie.json new file mode 100644 index 0000000..667e109 --- /dev/null +++ b/q2_1_crawler/movie.json @@ -0,0 +1,102 @@ +[ + { + "id": "1", + "title": "泰坦尼克号", + "director": "Frank Darabont", + "year": "2015", + "rating": "6.8", + "duration": "91", + "genre": "科幻", + "actors_count": "3" + }, + { + "id": "2", + "title": "星际穿越", + "director": "陈凯歌", + "year": "2021", + "rating": "6.2", + "duration": "113", + "genre": "科幻", + "actors_count": "2" + }, + { + "id": "3", + "title": "三傻大闹宝莱坞", + "director": "Robert Zemeckis", + "year": "2004", + "rating": "7.4", + "duration": "95", + "genre": "悬疑", + "actors_count": "4" + }, + { + "id": "4", + "title": "阿甘正传", + "director": "James Cameron", + "year": "2013", + "rating": "6.9", + "duration": "93", + "genre": "爱情", + "actors_count": "4" + }, + { + "id": "5", + "title": "放牛班的春天", + "director": "宫崎骏", + "year": "2005", + "rating": "7.1", + "duration": "127", + "genre": "悬疑", + "actors_count": "3" + }, + { + "id": "6", + "title": "千与千寻", + "director": "Christopher Nolan", + "year": "2024", + "rating": "6.4", + "duration": "147", + "genre": "动画", + "actors_count": "3" + }, + { + "id": "7", + "title": "忠犬八公的故事", + "director": "Lasse Hallström", + "year": "2002", + "rating": "6.2", + "duration": "166", + "genre": "剧情", + "actors_count": "4" + }, + { + "id": "8", + "title": "霸王别姬", + "director": "Rajkumar Hirani", + "year": "2005", + "rating": "7.9", + "duration": "149", + "genre": "冒险", + "actors_count": "2" + }, + { + "id": "9", + "title": "肖申克的救赎", + "director": "Christophe Barratier", + "year": "2008", + "rating": "9.3", + "duration": "91", + "genre": "冒险", + "actors_count": "2" + }, + { + "id": "10", + "title": "盗梦空间", + "director": "Christopher Nolan", + "year": "2019", + "rating": "7.1", + "duration": "132", + "genre": "剧情", + "actors_count": "5" + } +] \ No newline at end of file diff --git a/q2_1_crawler/q2_1.py b/q2_1_crawler/q2_1.py new file mode 100644 index 0000000..2d782cf --- /dev/null +++ b/q2_1_crawler/q2_1.py @@ -0,0 +1,65 @@ +import requests +from bs4 import BeautifulSoup as bs +import json + +url = 'https://exam.detr.top/exam-b/movies' +headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/149.0.0.0 Safari/537.36 Edg/149.0.0.0', + 'Referer':'https://exam.detr.top/exam-b/movies'} +req = requests.get(url, headers=headers) +req.encoding="utf-8" + +data=[] + +soup=bs(req.text,"html.parser") +# print(soup) +#id, title, director, year, rating, duration, genre, actors_count + +item=soup.select("table tbody tr" ) +movie_list=[] + +for tr in item: + tds=tr.find_all("td") + tds=list(tds) + # print(tds) + if len(tds)<8: + continue + movie={ + "id":tds[0].get_text(strip=True), + "title":tds[1].get_text(strip=True), + "director":tds[2].get_text(strip=True), + "year":tds[3].get_text(strip=True), + "rating":tds[4].get_text(strip=True), + "duration":tds[5].get_text(strip=True), + "genre":tds[6].get_text(strip=True), + "actors_count":tds[7].get_text(strip=True) + } + movie_list.append(movie) +print(movie_list) + + +with open('movie.json', 'w', encoding='utf-8') as f: + json.dump(movie_list, f, ensure_ascii=False, indent=2) + +with open("move.html","w",encoding='utf-8') as f: + json.dump(movie_list, f, ensure_ascii=False, indent=2) + + + + + +# for i in range(len(items)): +# rank=i+1 +# title=items[i].find("span",class_="title").get_text() +# actors=items[i].find("div",class_="bd").get_text().strip() +# try: +# actors=actors.split("主演:")[1].split("\n")[0] +# except: +# actors="无" +# quote=items[i].find("p",class_="quote").get_text().strip() + +# data.append({ +# "rank":rank, +# "title":title, +# "actors":actors, +# "quote":quote +# }) \ No newline at end of file diff --git a/q2_1_crawler/q2_2.py b/q2_1_crawler/q2_2.py new file mode 100644 index 0000000..f30c2f7 --- /dev/null +++ b/q2_1_crawler/q2_2.py @@ -0,0 +1,43 @@ +# ① 找出评分最高和最低的电影,打印电影名 + 评分。 +# ② 统计各类型的电影数量,用字典格式输出。 +# ③ 统计各导演的电影数量,用字典格式输出。 +# ④ 统计 2020 年(含)以后上映的电影数量。 + +import json + + + +with open('movie.json', 'r', encoding='utf-8') as f: + data=json.load(f) + # print(data) + + +sort_movie=sorted(data,key=lambda x:x["rating"]) +min=sort_movie[0] +max=sort_movie[-1] +print("评分最低的电影",min["title"],min["rating"]) +print("评分最高的电影",max["title"],max["rating"]) + +genre_shu={} +for g in data: + ge=g["genre"] + if ge in genre_shu: + genre_shu[ge]+=1 + else: + genre_shu[ge]=1 +print("各类型的电影数量",genre_shu) + +director_shu={} +for d in data: + di=d["director"] + if di in director_shu: + director_shu[di]+=1 + else: + director_shu[di]=1 +print("各导演的电影数量",director_shu) + +a=0 +for y in data: + if int(y["year"]) >= 2020: + a+=1 +print("2020 年(含)以后上映的电影数量",a) \ No newline at end of file diff --git a/q3/.gitkeep b/q3/.gitkeep deleted file mode 100644 index b22e7b9..0000000 --- a/q3/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# 此目录用于存放学生提交的文件,请勿删除 \ No newline at end of file diff --git a/q3/q3_1/.gitkeep b/q3/q3_1/.gitkeep deleted file mode 100644 index b22e7b9..0000000 --- a/q3/q3_1/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# 此目录用于存放学生提交的文件,请勿删除 \ No newline at end of file diff --git a/q3/q3_1/q3_1_image_label.zip b/q3/q3_1/q3_1_image_label.zip new file mode 100644 index 0000000..5b08d84 Binary files /dev/null and b/q3/q3_1/q3_1_image_label.zip differ diff --git a/q3/q3_2/.gitkeep b/q3/q3_2/.gitkeep deleted file mode 100644 index b22e7b9..0000000 --- a/q3/q3_2/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# 此目录用于存放学生提交的文件,请勿删除 \ No newline at end of file diff --git a/q3/q3_2/q3_2_takeout_reviews.json b/q3/q3_2/q3_2_takeout_reviews.json new file mode 100644 index 0000000..94c7925 --- /dev/null +++ b/q3/q3_2/q3_2_takeout_reviews.json @@ -0,0 +1 @@ +[{"id":32,"annotations":[{"id":20,"completed_by":1,"result":[{"value":{"choices":["正面"]},"id":"Onq13IcOtQ","from_name":"sentiment","to_name":"text","type":"choices","origin":"manual"}],"was_cancelled":false,"ground_truth":false,"created_at":"2026-06-25T07:10:52.845728Z","updated_at":"2026-06-25T07:10:52.845818Z","draft_created_at":"2026-06-25T07:10:22.361376Z","lead_time":12.684999999999999,"prediction":{},"result_count":1,"unique_id":"0dd85e72-c68d-4302-ab92-11c8114bd5bb","import_id":null,"last_action":null,"bulk_created":false,"task":32,"project":12,"updated_by":1,"parent_prediction":null,"parent_annotation":null,"last_created_by":null}],"file_upload":"9ce97d9b-reviews.json","drafts":[],"predictions":[],"data":{"id":1,"text":"外卖小哥送得超快,餐盒还是热的,炸鸡酥脆多汁,酸辣粉也很正宗,分量足,五星好评!"},"meta":{},"created_at":"2026-06-25T07:10:04.754168Z","updated_at":"2026-06-25T07:10:52.903174Z","allow_skip":true,"inner_id":1,"total_annotations":1,"cancelled_annotations":0,"total_predictions":0,"comment_count":0,"unresolved_comment_count":0,"last_comment_updated_at":null,"project":12,"updated_by":1,"comment_authors":[]},{"id":33,"annotations":[{"id":21,"completed_by":1,"result":[{"value":{"choices":["负面"]},"id":"cJxGS89PGV","from_name":"sentiment","to_name":"text","type":"choices","origin":"manual"}],"was_cancelled":false,"ground_truth":false,"created_at":"2026-06-25T07:10:56.177683Z","updated_at":"2026-06-25T07:10:56.177702Z","draft_created_at":"2026-06-25T07:10:40.725815Z","lead_time":4.317,"prediction":{},"result_count":1,"unique_id":"ec19a0ee-0062-4fcb-be7e-3885bbb0e630","import_id":null,"last_action":null,"bulk_created":false,"task":33,"project":12,"updated_by":1,"parent_prediction":null,"parent_annotation":null,"last_created_by":null}],"file_upload":"9ce97d9b-reviews.json","drafts":[],"predictions":[],"data":{"id":2,"text":"等了一个半小时才送到,汤全洒了,面坨成一坨,联系客服也不回,太让人失望了。"},"meta":{},"created_at":"2026-06-25T07:10:04.754256Z","updated_at":"2026-06-25T07:10:56.235324Z","allow_skip":true,"inner_id":2,"total_annotations":1,"cancelled_annotations":0,"total_predictions":0,"comment_count":0,"unresolved_comment_count":0,"last_comment_updated_at":null,"project":12,"updated_by":1,"comment_authors":[]},{"id":34,"annotations":[{"id":22,"completed_by":1,"result":[{"value":{"choices":["正面"]},"id":"TEYHnwsP5p","from_name":"sentiment","to_name":"text","type":"choices","origin":"manual"}],"was_cancelled":false,"ground_truth":false,"created_at":"2026-06-25T07:11:02.709432Z","updated_at":"2026-06-25T07:11:02.709456Z","draft_created_at":"2026-06-25T07:10:44.119328Z","lead_time":5.162,"prediction":{},"result_count":1,"unique_id":"e99cc8d6-b0ec-4ba7-95c9-75e600010972","import_id":null,"last_action":null,"bulk_created":false,"task":34,"project":12,"updated_by":1,"parent_prediction":null,"parent_annotation":null,"last_created_by":null}],"file_upload":"9ce97d9b-reviews.json","drafts":[],"predictions":[],"data":{"id":3,"text":"奶茶是用料很扎实的现煮茶,珍珠Q弹有嚼劲,配送员态度也好,下次还会再点。"},"meta":{},"created_at":"2026-06-25T07:10:04.754319Z","updated_at":"2026-06-25T07:11:02.772529Z","allow_skip":true,"inner_id":3,"total_annotations":1,"cancelled_annotations":0,"total_predictions":0,"comment_count":0,"unresolved_comment_count":0,"last_comment_updated_at":null,"project":12,"updated_by":1,"comment_authors":[]},{"id":35,"annotations":[{"id":23,"completed_by":1,"result":[{"value":{"choices":["正面"]},"id":"WSCVtAKzJa","from_name":"sentiment","to_name":"text","type":"choices","origin":"manual"}],"was_cancelled":false,"ground_truth":false,"created_at":"2026-06-25T07:11:05.544799Z","updated_at":"2026-06-25T07:11:05.544821Z","draft_created_at":"2026-06-25T07:10:46.400039Z","lead_time":3.517,"prediction":{},"result_count":1,"unique_id":"b72bb1ff-9547-471d-8edd-58f5277dca12","import_id":null,"last_action":null,"bulk_created":false,"task":35,"project":12,"updated_by":1,"parent_prediction":null,"parent_annotation":null,"last_created_by":null}],"file_upload":"9ce97d9b-reviews.json","drafts":[],"predictions":[],"data":{"id":4,"text":"配送速度一般,但披萨味道不错,芝士拉丝效果好,性价比高,值得推荐。"},"meta":{},"created_at":"2026-06-25T07:10:04.754376Z","updated_at":"2026-06-25T07:11:05.601129Z","allow_skip":true,"inner_id":4,"total_annotations":1,"cancelled_annotations":0,"total_predictions":0,"comment_count":0,"unresolved_comment_count":0,"last_comment_updated_at":null,"project":12,"updated_by":1,"comment_authors":[]},{"id":36,"annotations":[{"id":24,"completed_by":1,"result":[{"value":{"choices":["负面"]},"id":"qLglBmVk44","from_name":"sentiment","to_name":"text","type":"choices","origin":"manual"}],"was_cancelled":false,"ground_truth":false,"created_at":"2026-06-25T07:11:08.643150Z","updated_at":"2026-06-25T07:11:08.643172Z","draft_created_at":"2026-06-25T07:10:48.222193Z","lead_time":2.6100000000000003,"prediction":{},"result_count":1,"unique_id":"7e05411c-9818-4359-872a-ed1e588d0687","import_id":null,"last_action":null,"bulk_created":false,"task":36,"project":12,"updated_by":1,"parent_prediction":null,"parent_annotation":null,"last_created_by":null}],"file_upload":"9ce97d9b-reviews.json","drafts":[],"predictions":[],"data":{"id":5,"text":"点的麻辣烫食材不新鲜,有股怪味,吃完拉肚子,商家推卸责任,再也不点了。"},"meta":{},"created_at":"2026-06-25T07:10:04.754432Z","updated_at":"2026-06-25T07:11:08.699069Z","allow_skip":true,"inner_id":5,"total_annotations":1,"cancelled_annotations":0,"total_predictions":0,"comment_count":0,"unresolved_comment_count":0,"last_comment_updated_at":null,"project":12,"updated_by":1,"comment_authors":[]}] \ No newline at end of file diff --git a/q3/q3_3_质量自评.md b/q3/q3_3_质量自评.md new file mode 100644 index 0000000..f8dde57 --- /dev/null +++ b/q3/q3_3_质量自评.md @@ -0,0 +1,13 @@ +1.标注前准备: +在图片标注中 +(1)边界框必须紧贴目标物轮廓 +(2)标签必须为 cat、dog、car(必须小写英文,不能写成"猫/狗/车"或"Cat/Dog/Car") +在文本标注中 +(1)标注必须包含每条评论的 id 和 text 字段 +(2)每个标注必须有一个 sentiment 字段,值为 "正面" 或 "负面" +2.标注过程 +在图像的标注中,边界框内的留白较多,解决方法:最大限度地贴近目标轮廓 +在文本的标注中,可能会遇到情感模糊的问题,解决办法:抓住关键词进行标注。 +3.标注后检查 +在图像标注中,再次检查边界框与目标轮廓是否贴合,检查导出的 yolo 文件内容是否完整 +在文本标注中,对照关键词检查情感是否判断正确,检查导出的 json文件内容是否完整 \ No newline at end of file diff --git a/q4/.gitkeep b/q4/.gitkeep deleted file mode 100644 index b22e7b9..0000000 --- a/q4/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# 此目录用于存放学生提交的文件,请勿删除 \ No newline at end of file diff --git a/q4/q4_1/.gitkeep b/q4/q4_1/.gitkeep deleted file mode 100644 index b22e7b9..0000000 --- a/q4/q4_1/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# 此目录用于存放学生提交的文件,请勿删除 \ No newline at end of file diff --git a/q4/q4_1/q4_1.py b/q4/q4_1/q4_1.py new file mode 100644 index 0000000..b7526d6 --- /dev/null +++ b/q4/q4_1/q4_1.py @@ -0,0 +1,33 @@ +import matplotlib.pyplot as plt +import json + + +with open('movie.json', 'r', encoding='utf-8') as f: + data=json.load(f) + # print(data) + +genre_shu={} +for g in data: + ge=g["genre"] + if ge in genre_shu: + genre_shu[ge]+=1 + else: + genre_shu[ge]=1 +# print("各类型的电影数量",genre_shu) + +genre_lei=list(genre_shu.keys()) +genre_liang=list(genre_shu.values()) + +print(genre_liang) + +plt.figure(figsize=(14, 12)) +plt.bar(genre_lei, genre_liang, # 类别, 数值 + width=0.6) # 柱子宽度(0~1之间) + +# 标题和标签 +plt.title('类型电影数量分布', fontsize=14) +plt.xlabel('类型名称', fontsize=12) +plt.ylabel('电影数量', fontsize=12) + +plt.show() + diff --git a/q4/q4_1/q4_1_bar.png b/q4/q4_1/q4_1_bar.png new file mode 100644 index 0000000..0ed7e05 Binary files /dev/null and b/q4/q4_1/q4_1_bar.png differ diff --git a/q4/q4_2/.gitkeep b/q4/q4_2/.gitkeep deleted file mode 100644 index b22e7b9..0000000 --- a/q4/q4_2/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# 此目录用于存放学生提交的文件,请勿删除 \ No newline at end of file diff --git a/q4/q4_2/q4_2.py b/q4/q4_2/q4_2.py new file mode 100644 index 0000000..a3e4b3d --- /dev/null +++ b/q4/q4_2/q4_2.py @@ -0,0 +1,24 @@ +import matplotlib.pyplot as plt +import json + +rating=[] +duration=[] + +with open('movie.json', 'r', encoding='utf-8') as f: + data=json.load(f) + # print(data) + for i in data: + rating.append(i["rating"]) + duration.append(i["duration"]) +plt.figure(figsize=(12, 8)) +plt.scatter(duration, rating, + c='red', + s=80, # 点的大小 + alpha=0.6, # 透明度 + edgecolors='white') # 点的边框 +plt.title('时长与评分关系散点图', fontsize=14) +plt.xlabel('时长', fontsize=12) +plt.ylabel('评分', fontsize=12) +plt.grid(True, linestyle='--', alpha=0.5) +plt.show() + diff --git a/q4/q4_2/q4_2_scatter.png b/q4/q4_2/q4_2_scatter.png new file mode 100644 index 0000000..5dbac63 Binary files /dev/null and b/q4/q4_2/q4_2_scatter.png differ diff --git a/q4/q4_3a/.gitkeep b/q4/q4_3a/.gitkeep deleted file mode 100644 index b22e7b9..0000000 --- a/q4/q4_3a/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# 此目录用于存放学生提交的文件,请勿删除 \ No newline at end of file diff --git a/q4/q4_3a/q4_3a.py b/q4/q4_3a/q4_3a.py new file mode 100644 index 0000000..bc84d8b --- /dev/null +++ b/q4/q4_3a/q4_3a.py @@ -0,0 +1,19 @@ +import matplotlib.pyplot as plt +import json +rating=[] + +with open('movie.json', 'r', encoding='utf-8') as f: + data=json.load(f) + # print(data) + for i in data: + rating.append(i["rating"]) + +plt.figure(figsize=(12,8)) +plt.hist(rating, # 数据 + bins=3, # 分成几个柱子 + color='#3498DB', # 颜色 + edgecolor='white') # 柱子边框颜色 +plt.title('评分分布', fontsize=14) +plt.xlabel('评分', fontsize=13) +plt.grid(True, linestyle='--', alpha=0.5, axis='y') +plt.show() \ No newline at end of file diff --git a/q4/q4_3a/q4_3a_hist.png b/q4/q4_3a/q4_3a_hist.png new file mode 100644 index 0000000..6c68736 Binary files /dev/null and b/q4/q4_3a/q4_3a_hist.png differ diff --git a/q4/q4_3b/.gitkeep b/q4/q4_3b/.gitkeep deleted file mode 100644 index b22e7b9..0000000 --- a/q4/q4_3b/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# 此目录用于存放学生提交的文件,请勿删除 \ No newline at end of file diff --git a/q4/q4_3b/q4_3b.py b/q4/q4_3b/q4_3b.py new file mode 100644 index 0000000..38865a2 --- /dev/null +++ b/q4/q4_3b/q4_3b.py @@ -0,0 +1,19 @@ +import matplotlib.pyplot as plt +import json +duration=[] + +with open('movie.json', 'r', encoding='utf-8') as f: + data=json.load(f) + # print(data) + for i in data: + duration.append(i["duration"]) + +plt.figure(figsize=(12,8)) +plt.hist( duration, # 数据 + bins=3, # 分成几个柱子 + color='#3498DB', # 颜色 + edgecolor='white') # 柱子边框颜色 +plt.title('时长分布', fontsize=14) +plt.xlabel('时长(分钟)', fontsize=13) +plt.grid(True, linestyle='--', alpha=0.5, axis='y') +plt.show() \ No newline at end of file diff --git a/q4/q4_3b/q4_3b_hist.png b/q4/q4_3b/q4_3b_hist.png new file mode 100644 index 0000000..b142813 Binary files /dev/null and b/q4/q4_3b/q4_3b_hist.png differ