第一个Python小项目
编辑日期: 2024-11-28 文章阅读: 次
第一个Python小项目
上下文关键字(KWIC, Key Word In Context)是最常见的多行协调显示格式。
此小项目描述:输入一系列句子,给定一个给定单词,每个句子中至少会出现一次给定单词
。目标输出,给定单词按照KWIC显示,KWIC显示的基本要求:待查询单词居中,前面pre
序列右对齐,后面post
序列左对齐,待查询单词前和后长度相等,若输入句子无法满足要
求,用空格填充。
输入参数:输入句子sentences
, 待查询单词selword
, 滑动窗口长度window_len
举例,输入如下六个句子,给定单词secure
,输出如下字符串:
pre keyword post
welfare , and secure the blessings of
nations , and secured immortal glory with
, and shall secure to you the
cherished . To secure us against these
defense as to secure our cities and
I can to secure economy and fidelity
def kwic(sentences: List[str], selword: str, window_len: int) -> str:
"""
:type: sentences: input sentences
:type: selword: selected word
:type: window_len: window length
"""
更多KWIC显示参考如下:
http://dep.chs.nihon-u.ac.jp/english_lang/tukamoto/kwic_e.html
此项目的完整代码和分析已发布在 Python中文网
以下代码都经过测试,完整可运行,当然错误可能还是再所难免,欢迎指正,提交链接:https://github.com/jackzhenguo/python-small-examples/issues
"""
@file: kwic_service.py
@desc: providing functions about KWIC presentation
@author: group3
@time: 5/9/2021
"""
import re
from typing import List
获取关键词sel_word
的窗口,默认窗口长度为5
def get_keyword_window(sel_word: str, words_of_sentence: List, length=5) -> List[str]:
"""
find the index of sel_word at sentence, then decide words of @length size
by backward and forward of it.
For example: I am very happy to this course of psd if sel_word is happy, then
returning: [am, very, happy, to, this]
if length is even, then returning [very, happy, to, this]
remember: sel_word being word root
"""
if length <= 0 or len(words_of_sentence) <= length:
return words_of_sentence
index = -1
for iw, word in enumerate(words_of_sentence):
word = word.lower()
if len(re.findall(sel_word.lower(), word)) > 0:
index = iw
break
if index == -1:
return words_of_sentence
if index < length // 2:
back_slice = words_of_sentence[:index]
if (length - index) >= len(words_of_sentence):
return words_of_sentence
else:
return back_slice + words_of_sentence[index: index + length - len(back_slice)]
if (index + length // 2) >= len(words_of_sentence):
forward_slice = words_of_sentence[index:len(words_of_sentence)]
if index - length <= 0:
return words_of_sentence
else:
return words_of_sentence[index - (length - len(forward_slice)):index] + forward_slice
return words_of_sentence[index - length // 2: index + length // 2 + 1] if length % 2 \
else words_of_sentence[index - length // 2 + 1: index + length // 2 + 1]
KWIC显示逻辑:
def kwic_show(sel_language, words_of_sentence, sel_word, window_size=9, align_param=70, token_space_param=1):
"""return kwic string for words_of_sentence and sel_word being key token
:param sel_language: selected language
:param words_of_sentence: all words in one sentence
:param sel_word: key token
:param window_size: size of kwic window
:param align_param: parameters used to align the display
:param token_space_param: space length before or after keyword
window_size and align_param's default value is not suggested to revise
"""
if window_size < 1:
return None
if window_size >= len(words_of_sentence):
window_size = len(words_of_sentence)
words_in_window = get_keyword_window(sel_word, words_of_sentence, window_size)
sent = ' '.join(words_in_window)
try:
key_index = sent.lower().index(sel_word.lower())
except ValueError as ve:
key_index = -1
if key_index == -1:
return None, None
align_param = align_param - len(sel_word) - 2 * token_space_param
if align_param < 0:
log.warning('align_param value required bigger length of input word')
return None, None
pre_part = sent[:key_index].rstrip()
i, n_pre_words = 1, len(pre_part.split(' '))
while i < n_pre_words and len(pre_part) > align_param // 2:
pre_words = pre_part.split(' ')
pre_words = pre_words[i:]
pre_part = " ".join(pre_words)
i += 1
pre_kwic = pre_part.rjust(align_param // 2)
key_kwic = token_space_param * ' ' + sent[key_index: key_index + len(sel_word)].lstrip() + token_space_param * ' '
post_kwic = sent[key_index + len(sel_word):].lstrip()
n_post_words = len(post_kwic.split(' '))
i = n_post_words - 1
while i > 0 and len(post_kwic) > align_param // 2:
post_kwic_words = post_kwic.split(' ')
post_kwic_words = post_kwic_words[:i]
post_kwic = " ".join(post_kwic_words)
i -= 1
sel_word_kwic = pre_kwic + key_kwic + post_kwic
return sel_word_kwic, pre_kwic
测试代码
"""
@file: test_kwic_show.py
@desc:
@author: group3
@time: 5/3/2021
"""
from src.feature.kwic import kwic_show
if __name__ == '__main__':
words = ['I', 'am', 'very', 'happy', 'to', 'this', 'course', 'of', 'psd']
print(kwic_show('English', words, 'I', window_size=1)[0])
print(kwic_show('English', words, 'I', window_size=5)[0])
print(kwic_show('English', words, 'very', token_space_param=5)[0])
print(kwic_show('English', words, 'very', window_size=6, token_space_param=5)[0])
print(kwic_show('English', words, 'very', window_size=1, token_space_param=5)[0])
print(kwic_show('English', words, 'stem', align_param=20)[0])
print(kwic_show('English', words, 'stem', align_param=100)[0])
print(kwic_show('English', words, 'II', window_size=1)[0])
print(kwic_show('English', words, 'related', window_size=10000)[0])
打印结果
I
I am very happy to
I am very happy to this course of psd
I am very happy to this
very
None
None
None
None
Python 20个专题完整目录: