Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
About
Published:
Now my own website is open!
Published in Proceedings of the 4th Workshop on Argument Mining (ArgMining2017), 2017
We propose a method for the annotation of Japanese civil judgment documents, with the purpose of creating flexible summaries of these. The first step, described in the current paper, concerns content selection, i.e., the question of which material should be extracted initially for the summary. In particular, we utilize the hierarchical argu-ment structure of the judgment documents. Our main contributions are a) the design of an annotation scheme that stresses the connection between legal issues (called issue topics) and argument structure, b) an adaptation of rhetorical status to suit the Japanese legal system and c) the definition of a linked argument structure based on le-gal sub-arguments. In this paper, we report agreement between two annotators on sev-eral aspects of the overall task.
Recommended citation: Hiroaki Yamada, Simone Teufel and Takenobu Tokunaga. 2017. Annotation of argument structure in Japanese legal documents. In Proceedings of the 4th Workshop on Argument Mining (ArgMining2017). pages 22-31. http://www.aclweb.org/anthology/W17-5103
Published in Proceedings of the 9th International Conference on Knowledge and Systems Engineering (KSE 2017), 2017
We propose an annotation scheme for the summarization of Japanese judgment documents. This paper reports the details of the development of our annotation scheme for this task. We also conduct a human study where we compare the annotation of independent annotators. The end goal of our work is summarization, and our categories and the link system is a consequence of this. We propose three types of generic summaries which are focused on specific legal issues relevant to a given legal case.
Recommended citation: Hiroaki Yamada, Simone Teufel and Takenobu Tokunaga. 2017. Designing an annotation scheme for summarizing Japanese judgment documents. In Proceedings of the 9th International Conference on Knowledge and Systems Engineering (KSE 2017). pages 275-280. http://ieeexplore.ieee.org/document/8119471/
Published in The 41st annual Language Testing Research Colloquium, 2019
Recommended citation:
Sawaki, Y., Ishii, Y., & Yamada, H. (2019). Japanese university students’ paraphrasing strategies in L2 summary writing. The 41st annual Language Testing Research Colloquium.
Recommended citation: Sawaki, Y., Ishii, Y., & Yamada, H. (2019). Japanese university students' paraphrasing strategies in L2 summary writing. The 41st annual Language Testing Research Colloquium.
Published in The 6th Competition on Legal Information Extraction/Entailment (COLIEE-2019), 2019
Deep learning based approaches achieved significant advances in various Natural Language Processing (NLP) tasks. However, such approaches have not yet been evaluated in the legal domain compared to other domains such as news articles and colloquial texts. Since creating annotated data in the legal domain is expensive, applying deep learning models to the domain has been challenging. A fine-tuning approach can alleviate the situation; it allows a model trained with a large out-domain data set to be retrained on a smaller in-domain data set. A fine-tunable language model “BERT” was proposed and achieved state-of-the-art in various NLP tasks. In this paper, we explored the fine-tuning based approach in legal textual entailment task using COLIEE task 2 data set. The experimental results show that fine-tuning approach improves the performance, achieving F 1 = 0.50 with COLIEE task 2 dry run data.
Recommended citation: Hiroaki Yamada and Takenobu Tokunaga. 2019. A performance study on fine-tuned large language models in the Legal Case Entailment task. In Proceedings of the 6th Competition on Legal Information Extraction/Entailment (COLIEE-2019). https://www.cl.c.titech.ac.jp/tokunaga/_media/publication/yamada_2019aa.pdf
Published in Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications, BEA@ACL 2019, 2019
This paper discusses the computer-assisted content evaluation of summaries. We propose a method to make a correspondence between the segments of the source text and its summary. As a unit of the segment, we adopt “Idea Unit (IU)” which is proposed in Applied Linguistics. Introducing IUs enables us to make a correspondence even for the sentences that contain multiple ideas. The IU correspondence is made based on the similarity between vector representations of IU. An evaluation experiment with two source texts and 20 summaries showed that the proposed method is more robust against rephrased expressions than the conventional ROUGEbased baselines. Also, the proposed method outperformed the baselines in recall. We implemented the proposed method in a GUI tool “Segment Matcher” that aids teachers to establish a link between corresponding IUs across the summary and source text.
Recommended citation: Marcello Gecchele, Hiroaki Yamada, Takenobu Tokunaga and Yasuyo Sawaki. 2019. Supporting content evaluation of student summaries by Idea Unit embedding. In Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications (BEA@ACL2019). https://www.aclweb.org/anthology/W19-4436
Published in Legal Knowledge and Information Systems - JURIX 2019: The Thirty-second Annual Conference,, 2019
We address the legal text understanding task, and in particular we treat Japanese judgment documents in civil law. Rhetorical status classification (RSC) is the task of classifying sentences according to the rhetorical functions they fulfil; it is an important preprocessing step for our overall goal of legal summarisation. We present several improvements over our previous RSC classifier, which was based on CRF. The first is a BiLSTM-CRF based model which improves performance significantly over previous baselines. The BiLSTM-CRF architecture is able to additionally take the context in terms of neighbouring sentences into account. The second improvement is the inclusion of section heading information, which resulted in the overall best classifier. Explicit structure in the text, such as headings, is an information source which is likely to be important to legal professionals during the reading phase; this makes the automatic exploitation of such information attractive. We also considerably extended the size of our annotated corpus of judgment documents.
Recommended citation: Hiroaki Yamada, Simone Teufel and Takenobu Tokunaga. 2019. Neural network based Rhetorical status classification for Japanese judgement documents. In The proceedings of the 32nd International Conference on Legal Knowledge and Information Systems (JURIX 2019). pages 133–142. https://doi.org/10.3233/FAIA190314
Published in The 2022 International Conference on Language Resources and Evaluation (LREC2022),, 2022
This paper describes a comprehensive annotation study on Japanese judgment documents in civil cases. We aim to build an annotated corpus designed for Legal Judgment Prediction (LJP), especially for torts. Our annotation scheme contains annotations of whether tort is accepted by judges as well as its corresponding rationales for explainability purpose. Our annotation scheme extracts decisions and rationales at character-level. Moreover, the scheme can capture the explicit causal relation between judge’s decisions and their corresponding rationales, allowing multiple decisions in a document. To obtain high-quality annotation, we developed an annotation scheme with legal experts, and confirmed its reliability by agreement studies with Krippendorff’s alpha metric. The result of the annotation study suggests the proposed annotation scheme can produce a dataset of Japanese LJP at reasonable reliability.
Recommended citation: Hiroaki Yamada, Takenobu Tokunaga, Ryutaro Ohara, Keisuke Takeshita, and Mihoko Sumida. 2022. Annotation Study of Japanese Judgments on Tort for Legal Judgment Prediction with Rationales. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 779–790, Marseille, France. European Language Resources Association. https://aclanthology.org/2022.lrec-1.83
Published in The 2022 International Conference on Language Resources and Evaluation (LREC2022),, 2022
In this paper, we approach summary evaluation from an applied linguistics (AL) point of view. We provide computational tools to AL researchers to simplify the process of Idea Unit (IU) segmentation. The IU is a segmentation unit that can identify chunks of information. These chunks can be compared across documents to measure the content overlap between a summary and its source text. We propose a full revision of the annotation guidelines to allow machine implementation. The new guideline also improves the inter-annotator agreement, rising from 0.547 to 0.785 (Cohen’s Kappa). We release L2WS 2021, a IU gold standard corpus composed of 40 manually annotated student summaries. We propose IUExtract; i.e. the first automatic segmentation algorithm based on the IU. The algorithm was tested over the L2WS 2021 corpus. Our results are promising, achieving a precision of 0.789 and a recall of 0.844. We tested an existing approach to IU alignment via word embeddings with the state of the art model SBERT. The recorded precision for the top 1 aligned pair of IUs was 0.375. We deemed this result insufficient for effective automatic alignment. We propose “SAT”, an online tool to facilitate the collection of alignment gold standards for future training.
Recommended citation: Marcello Gecchele, Hiroaki Yamada, Takenobu Tokunaga, Yasuyo Sawaki, and Mika Ishizuka. 2022. Automating Idea Unit Segmentation and Alignment for Assessing Reading Comprehension via Summary Protocol Analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4663–4673, Marseille, France. European Language Resources Association. https://aclanthology.org/2022.lrec-1.498
Published in Findings of the The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP 2022),, 2022
This paper investigates the pretrained language model (PLM) specialised in the Japanese legal domain. We create PLMs using different pretraining strategies and investigate their performance across multiple domains. Our findings are (i) the PLM built with general domain data can be improved by further pretraining with domain-specific data, (ii) domain-specific PLMs can learn domain-specific and general word meanings simultaneously and can distinguish them, (iii) domain-specific PLMs work better on its target domain; still, the PLMs retain the information learnt in the original PLM even after being further pretrained with domain-specific data, (iv) the PLMs sequentially pretrained with corpora of different domains show high performance for the later learnt domains.
Recommended citation: Keisuke Miyazaki, Hiroaki Yamada and Takenobu Tokunaga. 2022. Cross-domain Analysis on Japanese Legal Pretrained Language Models. In Findings of the The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, pages 274–281, Online. https://aclanthology.org/2022.findings-aacl.26
Published in Legal Knowledge and Information Systems - JURIX 2023: The Thirty-sixth Annual Conference,, 2023
With the increasing demand for summarizing Japanese judgment documents, the automatic generation of high-quality summaries by large language models (LLMs) is expected. We propose a method to select exemplars using the nearest neighbor search for the one-shot learning method. The experiments showed our method outperforms baseline methods.
Recommended citation: Akito Shimbo, Yuta Sugawara, Hiroaki Yamada, Takenobu. 2023. Nearest Neighbor Search for Summarization of Japanese Judgment Documents. Legal Knowledge and Information Systems - JURIX 2023: The Thirty-sixth Annual Conference, pages 225–340, Maastricht, The Netherlands. https://doi.org/10.3233/FAIA230984
Published in The 16th International Conference on Computer Supported Education,, 2024
This paper introduces our ongoing research project that aims to generate multiple-choice questions for the Japanese National Nursing Examination using large language models (LLMs). We report the progress and prospects of our project. A preliminary experiment assessing the LLMs’ potential for question generation in the nursing domain led us to focus on distractor generation, which is a difficult part of the entire questiongeneration process. Therefore, our problem is generating distractors given a question stem and key (correct choice). We prepare a question dataset from the past National Nursing Examination for the training and evaluation of LLMs. The generated distractors are evaluated with compared to the reference distractors in the test set. We propose reference-based evaluation metrics for distractor generation by extending recall and precision, which is popular in information retrieval. However, as the reference is not the only acceptable answer, we also conduct human evaluatio n. We evaluate four LLMs: GPT-4 with few-shot learning, ChatGPT with few-shot learning, ChatGPT with fine-tuning and JSLM with fine-tuning. Our future plan includes improving the LLMs’ performance by integrating question writing guidelines into the prompts to LLMs and conducting a large-scale administration of automatically generated questions.
Recommended citation: Yusei Kido, Hiroaki Yamada, Takenobu Tokunaga, Rika Kimura, Yuriko Miura, Yumi Sakyo and Naoko Hayashi. 2024. Automatic Question Generation for the Japanese National Nursing Examination Using Large Language Models. In Proceedings of the 16th International Conference on Computer Supported Education (CSEDU 2024), pages 821-829. https://doi.org/10.5220/0012729200003693
Published in The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024),, 2024
Interpretation methods provide saliency scores indicating the importance of input words for neural summarization models. Prior work has analyzed models by comparing them to human behavior, often using eye-gaze as a proxy for human attention in reading tasks such as classification. This paper presents a framework to analyze the model behavior in summarization by comparing it to human summarization behavior using eye-gaze data. We examine two research questions: RQ1) whether model saliency conforms to human gaze during summarization and RQ2) how model saliency and human gaze affect summarization performance. For RQ1, we measure conformity by calculating the correlation between model saliency and human fixation counts. For RQ2, we conduct ablation experiments removing words/sentences considered important by models or humans. Experiments on two datasets with human eye-gaze during summarization partially confirm that model saliency aligns with human gaze (RQ1). However, ablation experiments show that removing highly-attended words/sentences from the human gaze does not significantly degrade performance compared with the removal by the model saliency (RQ2).
Recommended citation: Fariz Ikhwantri, Hiroaki Yamada, and Takenobu Tokunaga. 2024. Analyzing Interpretability of Summarization Model with Eye-gaze Information. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 939–950, Torino, Italia. ELRA and ICCL. https://aclanthology.org/2024.lrec-main.84
Published:
We present an annotation scheme describing the argument structure of judgement documents, a central construct in Japanese law. To support the final goal of this work, namely summarisation aimed at the legal professions, we have designed blueprint models of summaries of various granularities, and our annotation model in turn is fitted around the information needed for the summaries. In this paper we report results of a manual annotation study, showing that the annotation is stable. The annotated corpus we created contains 89 documents (37,673 sentences; 2,528,604 characters). We also designed and implemented the first two stages of an algorithm for the automatic extraction of argument structure, and present evaluation results.
Recommended citation: Hiroaki Yamada, Simone Teufel and Takenobu Tokunaga. 2019. Building a Corpus of Legal Argumentation in Japanese Judgement Documents: Towards Structure-Based Summarisation. Artificial Intelligence and Law. Springer Netherlands, 27(2):141–170. https://doi.org/10.1007/s10506-019-09242-3
Published:
This paper provides the first broad overview of the relation between different interpretation methods and human eye-movement behaviour across different tasks and architectures. The interpretation methods of neural networks provide the information the machine considers important, while the human eye-gaze has been believed to be a proxy of the human cognitive process. Thus, comparing them explains machine behaviour in terms of human behaviour, leading to improvement in machine performance through minimising their difference. We consider three types of natural language processing (NLP) tasks: sentiment analysis, relation classification and question answering, and four interpretation methods based on: simple gradient, integrated gradient, input-perturbation and attention, and three architectures: LSTM, CNN and Transformer. We leverage two corpora annotated with eye-gaze information: the Zuco dataset and the MQA-RC dataset. This research sets up two research questions. First, we investigate whether the saliency (importance) of input-words conform with those from human eye-gaze features. To this end, we compute a saliency distance (SD) between input words (by an interpretation method) and an eye-gaze feature. SD is defined as the KL-divergence between the saliency distribution over input words and an eye-gaze feature. We found that the SD scores vary depending on the combinations of tasks, interpretation methods and architectures. Second, we investigate whether the models with good saliency conformity to human eye-gaze behaviour have better prediction performances. To this end, we propose a novel evaluation device called “SD-performance curve” (SDPC) which represents the cumulative model performance against the SD scores. SDPC enables us to analyse the underlying phenomena that were overlooked using only the macroscopic metrics, such as average SD scores and rank correlations, that are typically used in the past studies. We observe that the impact of good saliency conformity between humans and machines on task performance varies among the combinations of tasks, interpretation methods and architectures. Our findings should be considered when introducing eye-gaze information for model training to improve the model performance.
Recommended citation: Fariz Ikhwantri, Jan Wira Gotama Putra, Hiroaki Yamada, Takenobu Tokunaga. 2023. Looking deep in the eyes: Investigating interpretation methods for neural models on reading tasks using human eye-movement behaviour. Information Processing & Management. Elsevier Ltd, 60(2023) 103195. https://doi.org/10.1016/j.ipm.2022.103195
Published:
This paper presents the first dataset for Japanese Legal Judgment Prediction (LJP), the Japanese Tort-case Dataset (JTD), which features two tasks: tort prediction and its rationale extraction. The rationale extraction task identifies the court’s accepting arguments from alleged arguments by plaintiffs and defendants, which is a novel task in the field. JTD is constructed based on annotated 3477 Japanese Civil Code judgments by 41 legal experts, resulting in 7978 instances with 59,697 of their alleged arguments from the involved parties. Our baseline experiments show the feasibility of the proposed two tasks, and our error analysis by legal experts identifies sources of errors and suggests future directions of the LJP research.
Recommended citation: Hiroaki Yamada, Takenobu Tokunaga, Ryutaro Ohara, Akira Tokutsu, Keisuke Takeshita, and Mihoko Sumida. 2024. Japanese tort-case dataset for rationale-supported legal judgment prediction. Artificial Intelligence and Law. https://doi.org/10.1007/s10506-024-09402-0
Published:
ソーシャルメディアでの感情分析や感情的かつ共感的な対話システムの構築を目的として,対話における発話の感情認識 ERC: Emotion Recognition in Conversations が注目を集めている.ERC では,似た内容を示す発話でも一連の発話の内容(文脈)に応じて異なる感情を示すことが知られている.文脈を把握する代表的な手法として,一連の発話を連結し識別モデルに入力する手法がある.この従来手法は,識別対象の発話とその先行文脈(対話)を入力し,識別モデル単体で対象の発話の感情ラベルを予測する特徴を持つ.本研究は,モデル外部のデータベースを活用して従来の識別モデルを補強する方法を提案する.具体的には,識別対象の発話と,意味的に近い発話を訓練セットから検索し,検索した発話(近傍事例)に付与された感情ラベルを基に確率分布を作成して,従来の識別モデルの確率分布と重み付き線形和によって組み合わせる.さらに本手法は,定数による重み付き線形和だけでなく,識別対象の発話ごとに動的に重み係数を変更する方法を提案する.評価実験において,ERC における 3 つのベンチマークデータで,動的に重み係数を変更する提案手法が,従来手法を上回る最高水準の認識性能を示した.
Recommended citation: 石渡 太智, 後藤 淳, 山田 寛章, 徳永 健伸. 近傍事例を用いた対話における感情認識, 自然言語処理, 2024, 31 巻, 2 号, p. 504-533, 2024/06/15, Online ISSN 2185-8314, Print ISSN 1340-7619. https://doi.org/10.5715/jnlp.31.504
Published:
The present paper provides an overview of an online module for formative assessment of summary writing skills for second language (L2) introductory academic writing instruction in Japan and presents initial empirical results on how Japanese undergraduate students’ summary writing performance changed with a series of automated summary content feedback delivered in the module. A key feature of this module was the provision of fine-grained feedback delivered as scaffolding during revisions in terms of two key aspects of summary content: main idea representation and paraphrasing. Participants were 64 Japanese undergraduate engineering majors in introductory academic writing courses at a private university in Tokyo. The students completed two summary writing tasks provided through the online module. Results of a multivariate analysis of variance showed significant improvement of the content analytic score on revision on the initial summary task, and that this improved performance level was retained on a transfer task. The language use analytic score also improved significantly on the transfer task. Detailed analyses of learner-produced summaries based on descriptive statistics further suggested that the learners made substantively meaningful changes concerning main idea coverage and verbatim copying of the source text while still meeting the length requirement, although the results differed somewhat across the source texts assigned. Despite some study limitations, these results provide initial support for immediate content feedback provision for the development of basic summary writing skills.
Recommended citation: Yasuyo Sawaki, Yutaka Ishii, Hiroaki Yamada, Takenobu Tokunaga. Developing and validating an online module for formative assessment of summary writing with automated content feedback for EFL academic writing instruction, Language Testing in Asia 14, 50 (2024). https://doi.org/10.1186/s40468-024-00325-w
Published:
We are accumulating electronic files of English composition by university freshmen. For the past ten years or so, about 50 to 90 students enrolled in three English classes taught by the last author are submitting 15 essays per year, each in three different versions. The first is what the students come up with in half hour or so in class after engaging what we call "oral response practice," in which a group of three students in turn read a question card aloud, respond to the question and video record the interaction. Sets of 10 question cards around one topic are prepared by the teacher and distributed to the groups with a video camera. After class, students will spend some time to "complete" their compositions and submit them during the next class and review other students' essays within groups of six. The students are asked to revise the essays and submit the final version during the following class. In addition to the students' peer review comments, it would be desirable if the students can get feedback by statistical analysis of parsing and other processing of their own essays but the files submitted has to go through some pre-processing for the parser and analyzers to work properly. In this presentation, we report on your preliminary study and experimentation.
Recommended citation: 山田寛章, 石井雄隆, 原田康也. 日本人大学生の英語作文からの特徴量の自動抽出に向けて : 予備実験と今後の課題 (思考と言語). 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 114(100). pages 55-60, Jun 2014. ISSN 0913-5685. https://ci.nii.ac.jp/naid/110009925596
Published:
我々は,日本国の判決書に対する情報アクセスの容易化・効率化を目指し,検索の手掛かりとなる判決の要約を機械的に生成することを目指している.判決書に内包される共通の議論構造を適切に利用できれば,極めて高い品質の自動要約を実現することが可能となる.本研究の目的はそのための議論構造抽出である。本稿では、独自に作成した議論構造注釈付き判決書コーパスを利用し、形態素bigram等に加えて、モダリティ表現、機能表現、法律の名称、手掛かり句等の素性を導入し、SVMを用いた機械学習によって議論構造の基本となる修辞役割の自動分類を行い、基本的な素性のみでも判決書中の議論的テキストの弁別が可能であるという知見を得た。
Recommended citation: 山田寛章, Simone Teufel, 徳永健伸. 判決書自動要約のための修辞役割分類. 言語処理学会第24回年次大会発表論文集, pp. 785-788, 2018年3月. http://anlp.jp/proceedings/annual_meeting/2018/pdf_dir/P7-8.pdf
Published:
The goal of foreign language educations and / or learning is attainment of proficiency in the target language, and learners should not only acquire knowledge of vocabulary, expressions and grammar but also achieve automatization of mental processing of that language. To what extent are the English language education and learning in Japan achieving these goals?
Recommended citation: 原田康也, 森下美和, 鈴木正紀, 横森大輔, 遠藤智子, 前坊香菜子, 鍋井理沙, 桒原奈な子, 山田寛章, 河村まゆみ. 自律的相互学習の記録と分析からインタラクションの楽しさへ ~ 外国語としての英語自動処理の難しさを超えて ~ (思考と言語). 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 118(516). pages 17-22, Mar 2019. ISSN 2432-6380.
Published:
本稿では,日本国の判決書に対する修辞役割分類の自動化及びその性能改善について議論する.これまでの日本の判決書における修辞役割分類の研究では,Conditional Random Field (CRF) を用いた分類器を構築しF=0.63(マクロ平均値) の性能を達成していたものの,BACKGROUND(F=0.32) 及びCONCLUSION(F=0.39) の重要な役割について相対的に分類性能が低くなっており,改善の余地があった.本稿では,文間文脈を考慮可能な階層型 RNN をベースとするモデルを用いることで,日本国判決書における修辞役割分類の性能が従来の CRF による分類器に比べて向上することを示す.また,判決書中に出現する見出し情報を扱う専用のネットワークを階層型RNNに追加することで修辞役割類の性能が向上することを示す.
Recommended citation: 山田寛章, Simone Teufel, 徳永健伸. 見出し情報を考慮した階層型RNNによる日本語判決書のための修辞役割分類. 言語処理学会第26回年次大会発表論文集, pp. 37-40, 2020年3月. https://www.anlp.jp/proceedings/annual_meeting/2020/pdf_dir/P1-10.pdf
Published:
本研究では判決書からの重要箇所抽出タスクにおいて,法律分野の文書のみで事前学習を行ったBERT,日本語Wikipediaで事前学習されたBERTから追加の事前学習を行なったBERTを用い,その性能を汎用日本語BERTと比較検証した.実験より,法律分野に特化したBERTモデルを用いることで,汎用日本語BERTを超える性能があることを確認した.
Recommended citation: 菅原祐太, 宮崎桂輔, 山田寛章, 徳永健伸. 日本語法律BERTを用いた判決書からの重要箇所抽出. 言語処理学会第28回年次大会発表論文集, pp. 838-841, 2022年3月. https://www.anlp.jp/proceedings/annual_meeting/2022/pdf_dir/PT1-10.pdf
Published:
本論文では日本語の法律分野に特化したBERTモデルを提案する.民事事件判決書コーパスを用い,BERT を一から事前学習するモデルと,既存の汎用日本語BERT に追加事前学習するモデルを作成した.実験より,民事事件判決書を用いたMaskedLanguage Model,Next Sentence Prediction タスクについては既存の汎用日本語BERT に追加事前学習する手法が最も良い正解率を示すことがわかった.
Recommended citation: 宮崎桂輔, 菅原祐太, 山田寛章, 徳永健伸. 日本語法律分野文書に特化したBERT の構築. 言語処理学会第28回年次大会発表論文集, pp. 1546-1551, 2022年3月. https://www.anlp.jp/proceedings/annual_meeting/2022/pdf_dir/PT3-7.pdf
Published:
Legal judgment prediction (LJP) is the task of predicting the outcome of a court case based on input facts. Predicting legal judgment makes it possible to help not only legal professionals, but also the general public who are not legal specialists. An LJP system allows everyone to predict and foresee the outcome of litigation when involved in legal disputes. This article provides a simple introduction to artificial intelligence and natural language processing research in the field of LJP, reviews the recent advances in legal judgment prediction and related topics, and discusses the challenges and possible directions to develop a smarter and more trustful LJP system.
Recommended citation: 山田寛章. 法と人工知能の接点. 情報法制研究, 11 巻 p. 27-33, 2022年5月. https://www.jstage.jst.go.jp/article/alis/11/0/11_27/_article/-char/ja
Published:
ソーシャルメディアでの感情分析や感情的かつ共感的な対話システムの構築を目的として対話における各発話の感情認識(EmotionRecognition in Conversations: ERC) が注目を集めている.ERCでは,発話の内容だけでなく,発話間の関係が話者の感情に大きな影響を与えることが知られている.従来手法の多くは,発話間の関係を抽出し,高い認識性能を達成した.このような手法は,単体で高い認識性能を示すことが多いが,性質の異なるモデルを組み合わせることでさらなる性能向上が期待できる.本研究は,単体で高い性能を発揮するモデルが出力する感情ラベルの確率分布と,性質の異なる別のモデルを用いて検索した近傍事例から作成した確率分布とを組み合わせる手法を提案する.評価実験において,提案手法はERCにおける3つのベンチマークデータセットのうち,2つのデータセットでベースモデル単体の認識率を上回る性能を達成した.また並べ替え検定において,提案手法はベースモデル単体に対して統計的に有意な結果を示した.
Recommended citation: 石渡太智, 美野秀弥, 後藤淳, 山田寛章, 徳永健伸. 近傍事例を用いた対話における感情認識. 言語処理学会第29回年次大会発表論文集, pp. 567-571, 2023年3月. https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/A3-1.pdf
Published:
知識蒸留(KD)とは,大規模なニューラルネットワークを圧縮する手法の一つである.言語モデル向けKDの中で最高性能の手法は,敵対的学習に中間層出力と対照学習を導入したCILDAと呼ばれる手法である.CILDAの学習は最大化ステップと最小化ステップに分かれているが,中間層出力と対照学習は最大化ステップでのみ活用されている.本研究では,最小化ステップに中間層蒸留と対照学習を導入し,性能を向上させることを目指した.しかし,既存手法に対して有意な差は確認できなかったため,原因分析のためにCILDA単体の再現実験を行ったところ,先行研究の主張とは異なり,GLUEにおける複数のタスクでCILDAがそれ以前の手法の性能を上回らないという結果を得た.
Recommended citation: 鈴木偉士, 山田寛章, 徳永健伸. 敵対的学習を用いた知識蒸留への中間層蒸留と対照学習の導入. 言語処理学会第29回年次大会発表論文集, pp. 783-788, 2023年3月. https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/Q3-2.pdf
Published:
広告文の一種であるキャッチコピーの人手によるオフライン評価は高コストである.キャッチコピーの自動生成研究の迅速化・効率化のためには自動評価器が必要となる.自動評価器の構築のために必要なデータセットが現存しないため,日本語としては初となる23,641 件のキャッチコピーとその評価値から成るデータセットを構築した.このデータセットを利用してBERTと対照学習を用いた参照例を必要としない評価機を構築し,評価実験を行った結果,テストデータの評価値に対する相関係数が平均で0.28 を超えた.対照学習を用いない学習との比較も行い,対照学習の有用性を確認した.
Recommended citation: 新保彰人, 山田寛章, 徳永健伸. 参照例を使わないキャッチコピーの自動評価. 言語処理学会第29回年次大会発表論文集, pp. 1557-1562, 2023年3月. https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/A7-2.pdf
Published:
法ドメインではアノテーションが高コストのため学習データが不足する問題がある.本稿では,COLIEETASK4を用いて,ラベル付き学習データのルールベースによる拡張と,言語モデル事前学習の際の擬似的な学習データ拡張の効果検証を行う.実験の結果,提案手法である反対解釈によるデータ拡張手法が最良の性能を示した.
Recommended citation: 伊藤光一, 山田寛章, 徳永健伸. 低資源な法ドメイン含意タスクにおけるデータ拡張. 言語処理学会第29回年次大会発表論文集, pp. 990-885, 2023年3月. https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/P4-2.pdf
Published:
This paper presents the first dataset for Japanese Legal Judgment Prediction (LJP), the Japanese Tort-case Dataset (JTD), which features two tasks: tort prediction and its rationale extraction. The rationale extraction task identifies the court's accepting arguments from alleged arguments by plaintiffs and defendants, which is a novel task in the field. JTD is constructed based on annotated 3,477 Japanese Civil Code judgments by 41 legal experts, resulting in 7,978 instances with 59,697 of their alleged arguments from the involved parties. Our baseline experiments show the feasibility of the proposed two tasks, and our error analysis by legal experts identifies sources of errors and suggests future directions of the LJP research.
Recommended citation: Hiroaki Yamada, Takenobu Tokunaga, Ryutaro Ohara, Akira Tokutsu, Keisuke Takeshita, Mihoko Sumida. Japanese Tort-case Dataset for Rationale-supported Legal Judgment Prediction. ArXiv Preprint, 2023/12/1. https://arxiv.org/abs/2312.00480
Published:
This paper reports MONETECH's participation in FinArg-1's Argument Unit Identification in Earnings Conference Call subtask. Our experiments are based on the BERT and FinBERT models with additional experimentation on Large Language Model-based data augmentation, data filtering, and the model's layer freezing. Our best-performing submission, which is based on data filtering and the model's layer freezing, scores 75.54\% in micro F1 evaluation. Results from additional runs also show that the model's layer freezing and data filtering could further improve model performance beyond our best submission.
Recommended citation: Supawich Jiarakul, Hiroaki Yamada and Takenobu Tokunaga. MONETECH at the NTCIR-17 FinArg-1 Task: Layer Freezing, Data Augmentation, and Data Filtering for Argument Unit Identification. The 17th NTCIR Conference Evaluation of Information Access Technologies, 2023/12/12. https://doi.org/10.20736/0002001314
Published:
事実性検証の対象となる主張文章の中には複数の文章を情報源として参照し,複数段階の推論を経ることで初めて正しい判定ができる主張文章が存在する.本研究ではこのような主張文章に適した事実性検証の仕組みとして,要約-判定アーキテクチャを提案する.提案手法では,1 段階目に情報源として与えられた複数の文書の中から主張文章を支持する部分を複数文書要約し,2 段階目では生成した要約を用いて主張文章の事実性判定を行う.1 段階目で情報源の文書を短く要約することにより,主張文章の判定で大規模言語モデルの推論能力を有効活用することを狙う.提案手法の性能を HoVerデータセットを用いて評価したところ,従来のアプローチを超える性能を達成した.
Recommended citation: 伊藤悠馬, 山田寛章, 徳永健伸. 複数文書要約を用いた事実性の検証. 研究報告自然言語処理(NL), Volume 2024-NL-259, Issue 17, pp. 1-9, 2024年3月. http://id.nii.ac.jp/1001/00232766/
Published:
直喩表現(例:ひまわりのような笑顔)に対して,人のような自然な解釈(例:明るい笑顔)の候補を生成するモデルを作成することは,自然言語処理の分野において注目を集めている課題のひとつである.本研究では,事前学習済みマスク言語モデルBERTを用いて直喩表現に対する解釈を生成する.また,形容詞の補完に適したマスク言語モデル(Masked Language Model,MLM) の拡張手法と形容詞-名詞の修飾関係に着目した学習フレームワークを提案する.提案手法の適用によって,直喩解釈のスコアを表すRecall@5は0.296を示し,他比較対象を上回った.
Recommended citation: 鈴木颯仁, 山田寛章, 徳永健伸. 事前学習済みモデルを用いた日本語直喩表現の解釈. 言語処理学会第30回年次大会発表論文集, pp. 3137-3142, 2024年3月. https://www.anlp.jp/proceedings/annual_meeting/2024/pdf_dir/D11-6.pdf
Published:
意味関係の弁別は,人間にとっても,機械にとっても,容易なタスクではない.本研究では,多様な下流タスクで卓越した性能を示した事前訓練済み言語モデルが意味関係の弁別ができているか否かという問いに,混淆度という尺度を提案し,人間との比較の上でアプローチした.結果として,事前訓練済み言語モデルは,意味関係の弁別能力は,人間に比べて下回ることと同時に,非対義関係を対義関係として誤認識するバイアスが観察された.
Recommended citation: Cao Zhihan, 山田寛章, 徳永健伸. 対義関係バイアス: 事前訓練済み言語モデルと人間の意味関係間の弁別能力に関する分析. 言語処理学会第30回年次大会発表論文集, pp. 2194-2198, 2024年3月. https://www.anlp.jp/proceedings/annual_meeting/2024/pdf_dir/D8-3.pdf
Published:
In this paper, we explore the application of Generative Pre-trained Transformers (GPTs) in cross-lingual legal Question-Answering (QA) systems using the COLIEE Task 4 dataset. In the COLIEE Task 4, given a statement and a set of related legal articles that serve as context, the objective is to determine whether the statement is legally valid, i.e., if it can be inferred from the provided contextual articles or not, which is also known as an entailment task. By benchmarking four different combinations of English and Japanese prompts and data, we provide valuable insights into GPTs’ performance in multilingual legal QA scenarios, contributing to the development of more efficient and accurate cross-lingual QA solutions in the legal domain.
Recommended citation: Nguyen Ha Thanh, 山田寛章, 佐藤健. GPTs and Language Barrier: A Cross-Lingual Legal QA Examination. 言語処理学会第30回年次大会発表論文集, pp. 1062-1066, 2024年3月. https://www.anlp.jp/proceedings/annual_meeting/2024/pdf_dir/E4-5.pdf
Published:
日本語判決書の自動要約の需要の高まりに伴って,大規模言語モデル(LLM)によって高品質な判決書の要約文を出力することが期待されている.本研究ではOne-shot文脈内学習に用いるサンプルを近傍事例検索を用いて選ぶ手法を提案する.ベースライン手法と比較し,提案手法を用いることによって判決書要約の精度が高まることを示す.
Recommended citation: 新保彰人, 菅原祐太, 山田寛章, 徳永健伸. 大規模言語モデルを用いた日本語判決書の自動要約. 言語処理学会第30回年次大会発表論文集, pp. 1056-1061, 2024年3月. https://www.anlp.jp/proceedings/annual_meeting/2024/pdf_dir/E4-4.pdf
Published:
本研究は日本語・日本法における法的判断予測研究のためのデータセットである,日本語不法行為事件データセット(JapaneseTort-case Dataset, JTD) を提案する.JTDは不法行為判断予測タスク及びその根拠抽出タスク向けに設計されている.根拠抽出タスクは不法行為の成否判断に際して重要な根拠となった主張を,原告または被告の主張の中から抽出するタスクである.JTDには41人の法律専門家によって注釈付けされた3,477件の民事事件判決書に基づいて構築されており,7,978事例(事例に内包される原告・被告らの主張は59,697事例)が収録されている.ベースライン実験によりJTDの各タスクの実現可能性を確認し,さらに不法行為判断予測・根拠抽出の両タスクを同時に学習させることで性能が改善することを示した.
Recommended citation: 山田寛章, 徳永健伸, 小原隆太郎, 得津晶, 竹下啓介, 角田美穂子. 日本語不法行為事件データセットの構築. 言語処理学会第30回年次大会発表論文集, pp. 1045-1050, 2024年3月. https://www.anlp.jp/proceedings/annual_meeting/2024/pdf_dir/E4-2.pdf
Published:
我々は,近時,法律分野へのAI応用研究の一大領域となっている司法判断予測(Legal Judgement Prediction)研究のためのデータセットである,日本語不法行為事件データセット(Japanese Tort-case Dataset, JTD)を構築した.JTDには41人の法律専門家によって注釈付けされた3,477件の民事事件判決書に基づいて構築されており,7,978事例(事例に内包される原告・被告らの主張は59,697件)が収録されている.JTDは判決書という法律分野のデータを扱うことから,司法判断に関わる予測を計算機で行うことの是非,構築したデータセットに潜在するバイアスはもとより,判決データのオープンデータ化が未だ審議中というわが国特有の事情ゆえに,多様な社会的課題を検討しながら構築する必要があった.とりわけ,判決書のような多くのステークホルダーの利害に関係し得るデータを扱うデータセットを他の研究者と共有する環境は未だ十分整備されているとは言い難い.そこで,本稿では,今後の議論の深化に貢献すべく,構築時に検討した諸課題を報告する.
Recommended citation: 山田寛章, 小原隆太郎, 角田美穂子, 不法行為判断予測データセット構築におけるELSI課題, 人工知能学会全国大会論文集, Volume JSAI2024, 第38回 (2024), Online ISSN 2758-7347 2024年5月. https://doi.org/10.11517/pjsai.JSAI2024.0_3K1OS2a02
Published:
本論文では,日本の判決書の自動要約への応用を目的として,判決書からその議論構造を自動抽出する手法を提案した.裁判官や検事,弁護士等,法律の運用に携わる人々は,過去の関連事件の調査に膨大な時間を費やしている.裁判の記録として最も 重要な判決書は数十ページに及ぶことも多く,長く複雑な文が使われるため,専門家でも分析に時間を要する.そのため,専門家に対する支援は必要不可欠であり,計算機による判決書の要約は重大な意義を持つ.判決書は裁判官が法的議論を文章として記録したものであり,その重要な特徴として,裁判官の最終的判断である判決を最上位とする階層的議論構造を持つ.階層的議論構造とは,ある議論が根拠として別の議論を支持する構造である.「争点 (Issue Topic)」と呼ばれる論点ごとの議論が判決を支持し,各争点の結論はさらに下位の階層の議論によって支持される.そこで,本論文ではこの構造を自動抽出した上で,判決 書の要約へ応用するシステムの枠組みを提案した.本論文の貢献はこの枠組みを構成する,1)議論構造抽出の定式化および人間による注釈付けのための基準の策定, 2)定式化した議論構造抽出タスクに基づいた日本国判決書コーパスの構築,3)議論構造自動抽出モデルの提案, 4)議論構造の判決書自動要約への応用の4点である. 提案した議論構造抽出タスクは,修辞役割分類,議論的支持関係抽出,争点の特定,及び,争点関連付けの4つからなる.修辞役割分類は各文が文書中で果たす役割を分類するタスクで,本論文では「結論」や「法条の引用・参照」を含む計7つの分類を定義した. 議論的支持関係抽出は文同士の関係のうち,一方が根拠となりもう一方がその根拠を踏まえた主張を展開するような支持関係を特定するタスクである. 争点の特定では,ある判決書中での中心的な議題として提示されているトピックを含む文を特定する。争点関連付けでは,判決書中の各文を特定された争点に対して関連付ける.これら各タスクについて,人手による注釈付けが安定的にできることを検証するために Cohen の Kappa をはじめとする注釈付け一致度を計測した結果,各タスクの注釈付けが安定的に実施できることを確認した. 提案した各タスクの注釈付けを行い,日本語の法律分野では初となる議論構造注釈付きの判決書コーパスを構築した.コーパスは計120の民事判決書から構成され,文数にして約4.5万文,文字数にして320万文字の規模となっている.また,コーパス中の各判決書に対して専門家により作成された判決書要約が付与されている. 構築したコーパスに基づいて議論構造の各タスクの自動抽出手法を提案した.本論文の顕著な貢献として,判決書中に存在する節の見出し文と議論構造の関係に着目し,見出し文の情報を議論構造の自動抽出手法に組み込んだ点が挙げられる.修辞役割分類では,階層型再帰ニューラルネットワーク(RNN)を用いて文間文脈を考慮するモデルを元に,文が属する見出し文を専用に処理する独立した見出しエンコーダからの素性を考慮して各文の修辞役割を予測する手法を提案した.また,見出しエンコーダを用いて,見出し文からその見出しの配下にある文が担いうる修辞役割の集合を予測する副タスクを同時に学習する手法を提案した.提案したモデルはいずれも従来の階層型RNNモデルを用いた手法に対して有意に高い性能を示した.議論的支持関係の抽出タスクでは,支持関係の支持文と被支持文が特定の修辞役割を担うことから,支持関係抽出タスク単独で学習する手法に加えて,修辞役割分類を同時に学習する手法を提案し,比較実験を行った.実験結果から,修辞役割分類との同時学習は支持関係抽出タスクの性能を有意に向上させることを示した.争点抽出および関連付けタスクでは,事前学習済みモデルBERTを各タスクにfine-tuningすることで抽出・関連付けの自動化を行った. 争点抽出タスクでは,入力文に対してその文が属する見出しとその上位に連なる見出しを付加した上で学習することで,性能が有意に向上することを示した. 争点関連付けタスクでは,見出し配下の文が同一の争点に関連付けられることを利用し,争点-見出しのペアの二値分類タスクに簡約化した. 議論構造を考慮することが要約の性能向上に資することを検証するため,議論構造を用いて要約内容を誘導する機構を導入した要約器と通常の要約器の性能を比較する実験を行った.要約内容の誘導機構は,修辞役割分類と見出し情報を用いて要約器への入力を制御する前段処理と,争点の情報と議論的支持関係を用いて要約器からの出力を編集する後段処理から構成される.自動抽出した議論構造を誘導機構に用いた実験では,ROUGE-1を基準とした評価において有意な性能向上がみとめられ,コーパスに人手で付与された議論構造を誘導機構に用いた実験では,ROUGE-1, 2, L を基準とした評価において誘導機構による有意な性能の向上が認められた. 以上要するに,本論文は 4 つのサブタスクから成る議論構造抽出タスクを定式化し,そのための安定的な注釈付け基準を提供し,各サブタスクの自動化に対して見出しを活用した抽出モデルを提案した. 議論構造を利用した自動要約の枠組みは,議論構 造抽出の精度の更なる向上が必要であるものの,要約性能の向上に資するものであるという結論が得られた.
Recommended citation: Hiroaki Yamada. 2021. Extracting argument structure from Japanese judgment documents for structure-based summarisation. Doctoral thesis.
Published:
Published:
Published:
Published: