Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Website open!

less than 1 minute read

Published:

Now my own website is open!

publications

Annotation of argument structure in Japanese legal documents

Published in Proceedings of the 4th Workshop on Argument Mining (ArgMining2017), 2017

We propose a method for the annotation of Japanese civil judgment documents, with the purpose of creating flexible summaries of these. The first step, described in the current paper, concerns content selection, i.e., the question of which material should be extracted initially for the summary. In particular, we utilize the hierarchical argu-ment structure of the judgment documents. Our main contributions are a) the design of an annotation scheme that stresses the connection between legal issues (called issue topics) and argument structure, b) an adaptation of rhetorical status to suit the Japanese legal system and c) the definition of a linked argument structure based on le-gal sub-arguments. In this paper, we report agreement between two annotators on sev-eral aspects of the overall task.

Recommended citation: Hiroaki Yamada, Simone Teufel and Takenobu Tokunaga. 2017. Annotation of argument structure in Japanese legal documents. In Proceedings of the 4th Workshop on Argument Mining (ArgMining2017). pages 22-31. http://www.aclweb.org/anthology/W17-5103

Designing an annotation scheme for summarizing Japanese judgment documents

Published in Proceedings of the 9th International Conference on Knowledge and Systems Engineering (KSE 2017), 2017

We propose an annotation scheme for the summarization of Japanese judgment documents. This paper reports the details of the development of our annotation scheme for this task. We also conduct a human study where we compare the annotation of independent annotators. The end goal of our work is summarization, and our categories and the link system is a consequence of this. We propose three types of generic summaries which are focused on specific legal issues relevant to a given legal case.

Recommended citation: Hiroaki Yamada, Simone Teufel and Takenobu Tokunaga. 2017. Designing an annotation scheme for summarizing Japanese judgment documents. In Proceedings of the 9th International Conference on Knowledge and Systems Engineering (KSE 2017). pages 275-280. http://ieeexplore.ieee.org/document/8119471/

Japanese university students’ paraphrasing strategies in L2 summary writing

Published in The 41st annual Language Testing Research Colloquium, 2019

Recommended citation:
Sawaki, Y., Ishii, Y., & Yamada, H. (2019). Japanese university students’ paraphrasing strategies in L2 summary writing. The 41st annual Language Testing Research Colloquium.

Recommended citation: Sawaki, Y., Ishii, Y., & Yamada, H. (2019). Japanese university students' paraphrasing strategies in L2 summary writing. The 41st annual Language Testing Research Colloquium.

A performance study on fine-tuned large language models in the Legal Case Entailment task

Published in The 6th Competition on Legal Information Extraction/Entailment (COLIEE-2019), 2019

Deep learning based approaches achieved significant advances in various Natural Language Processing (NLP) tasks. However, such approaches have not yet been evaluated in the legal domain compared to other domains such as news articles and colloquial texts. Since creating annotated data in the legal domain is expensive, applying deep learning models to the domain has been challenging. A fine-tuning approach can alleviate the situation; it allows a model trained with a large out-domain data set to be retrained on a smaller in-domain data set. A fine-tunable language model “BERT” was proposed and achieved state-of-the-art in various NLP tasks. In this paper, we explored the fine-tuning based approach in legal textual entailment task using COLIEE task 2 data set. The experimental results show that fine-tuning approach improves the performance, achieving F 1 = 0.50 with COLIEE task 2 dry run data.

Recommended citation: Hiroaki Yamada and Takenobu Tokunaga. 2019. A performance study on fine-tuned large language models in the Legal Case Entailment task. In Proceedings of the 6th Competition on Legal Information Extraction/Entailment (COLIEE-2019). https://www.cl.c.titech.ac.jp/tokunaga/_media/publication/yamada_2019aa.pdf

Supporting content evaluation of student summaries by Idea Unit embedding

Published in Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications, BEA@ACL 2019, 2019

This paper discusses the computer-assisted content evaluation of summaries. We propose a method to make a correspondence between the segments of the source text and its summary. As a unit of the segment, we adopt “Idea Unit (IU)” which is proposed in Applied Linguistics. Introducing IUs enables us to make a correspondence even for the sentences that contain multiple ideas. The IU correspondence is made based on the similarity between vector representations of IU. An evaluation experiment with two source texts and 20 summaries showed that the proposed method is more robust against rephrased expressions than the conventional ROUGEbased baselines. Also, the proposed method outperformed the baselines in recall. We implemented the proposed method in a GUI tool “Segment Matcher” that aids teachers to establish a link between corresponding IUs across the summary and source text.

Recommended citation: Marcello Gecchele, Hiroaki Yamada, Takenobu Tokunaga and Yasuyo Sawaki. 2019. Supporting content evaluation of student summaries by Idea Unit embedding. In Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications (BEA@ACL2019). https://www.aclweb.org/anthology/W19-4436

Neural network based Rhetorical status classification for Japanese judgement documents

Published in Legal Knowledge and Information Systems - JURIX 2019: The Thirty-second Annual Conference,, 2019

We address the legal text understanding task, and in particular we treat Japanese judgment documents in civil law. Rhetorical status classification (RSC) is the task of classifying sentences according to the rhetorical functions they fulfil; it is an important preprocessing step for our overall goal of legal summarisation. We present several improvements over our previous RSC classifier, which was based on CRF. The first is a BiLSTM-CRF based model which improves performance significantly over previous baselines. The BiLSTM-CRF architecture is able to additionally take the context in terms of neighbouring sentences into account. The second improvement is the inclusion of section heading information, which resulted in the overall best classifier. Explicit structure in the text, such as headings, is an information source which is likely to be important to legal professionals during the reading phase; this makes the automatic exploitation of such information attractive. We also considerably extended the size of our annotated corpus of judgment documents.

Recommended citation: Hiroaki Yamada, Simone Teufel and Takenobu Tokunaga. 2019. Neural network based Rhetorical status classification for Japanese judgement documents. In The proceedings of the 32nd International Conference on Legal Knowledge and Information Systems (JURIX 2019). pages 133–142. https://doi.org/10.3233/FAIA190314

publications_jn

Building a corpus of legal argumentation in Japanese judgement documents: towards structure-based summarisation

Published:

We present an annotation scheme describing the argument structure of judgement documents, a central construct in Japanese law. To support the final goal of this work, namely summarisation aimed at the legal professions, we have designed blueprint models of summaries of various granularities, and our annotation model in turn is fitted around the information needed for the summaries. In this paper we report results of a manual annotation study, showing that the annotation is stable. The annotated corpus we created contains 89 documents (37,673 sentences; 2,528,604 characters). We also designed and implemented the first two stages of an algorithm for the automatic extraction of argument structure, and present evaluation results.

Recommended citation: Hiroaki Yamada, Simone Teufel and Takenobu Tokunaga. 2019. Building a Corpus of Legal Argumentation in Japanese Judgement Documents: Towards Structure-Based Summarisation. Artificial Intelligence and Law. Springer Netherlands, 27(2):141–170. https://doi.org/10.1007/s10506-019-09242-3

publications_nr

Toward Automatic Extraction of Characteristic Features of English Composition by Japanese College Students : Preliminary Experiments and Issues to be Addressed

Published:

We are accumulating electronic files of English composition by university freshmen. For the past ten years or so, about 50 to 90 students enrolled in three English classes taught by the last author are submitting 15 essays per year, each in three different versions. The first is what the students come up with in half hour or so in class after engaging what we call "oral response practice," in which a group of three students in turn read a question card aloud, respond to the question and video record the interaction. Sets of 10 question cards around one topic are prepared by the teacher and distributed to the groups with a video camera. After class, students will spend some time to "complete" their compositions and submit them during the next class and review other students' essays within groups of six. The students are asked to revise the essays and submit the final version during the following class. In addition to the students' peer review comments, it would be desirable if the students can get feedback by statistical analysis of parsing and other processing of their own essays but the files submitted has to go through some pre-processing for the parser and analyzers to work properly. In this presentation, we report on your preliminary study and experimentation.

Recommended citation: 山田寛章, 石井雄隆, 原田康也. 日本人大学生の英語作文からの特徴量の自動抽出に向けて : 予備実験と今後の課題 (思考と言語). 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 114(100). pages 55-60, Jun 2014. ISSN 0913-5685. https://ci.nii.ac.jp/naid/110009925596

判決書自動要約のための修辞役割分類

Published:

我々は,日本国の判決書に対する情報アクセスの容易化・効率化を目指し,検索の手掛かりとなる判決の要約を機械的に生成することを目指している.判決書に内包される共通の議論構造を適切に利用できれば,極めて高い品質の自動要約を実現することが可能となる.本研究の目的はそのための議論構造抽出である。本稿では、独自に作成した議論構造注釈付き判決書コーパスを利用し、形態素bigram等に加えて、モダリティ表現、機能表現、法律の名称、手掛かり句等の素性を導入し、SVMを用いた機械学習によって議論構造の基本となる修辞役割の自動分類を行い、基本的な素性のみでも判決書中の議論的テキストの弁別が可能であるという知見を得た。

Recommended citation: 山田寛章, Simone Teufel, 徳永健伸. 判決書自動要約のための修辞役割分類. 言語処理学会第24回年次大会発表論文集, pp. 785-788, 2018年3月. http://anlp.jp/proceedings/annual_meeting/2018/pdf_dir/P7-8.pdf

Autonomous Mutual Learning through Interaction – Difficulties in Automatization of Language Processing for Japanese EFL Learners

Published:

The goal of foreign language educations and / or learning is attainment of proficiency in the target language, and learners should not only acquire knowledge of vocabulary, expressions and grammar but also achieve automatization of mental processing of that language. To what extent are the English language education and learning in Japan achieving these goals?

Recommended citation: 原田康也, 森下美和, 鈴木正紀, 横森大輔, 遠藤智子, 前坊香菜子, 鍋井理沙, 桒原奈な子, 山田寛章, 河村まゆみ. 自律的相互学習の記録と分析からインタラクションの楽しさへ ~ 外国語としての英語自動処理の難しさを超えて ~ (思考と言語). 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 118(516). pages 17-22, Mar 2019. ISSN 2432-6380.

見出し情報を考慮した階層型RNNによる日本語判決書のための修辞役割分類

Published:

本稿では,日本国の判決書に対する修辞役割分類の自動化及びその性能改善について議論する.これまでの日本の判決書における修辞役割分類の研究では,Conditional Random Field (CRF) を用いた分類器を構築しF=0.63(マクロ平均値) の性能を達成していたものの,BACKGROUND(F=0.32) 及びCONCLUSION(F=0.39) の重要な役割について相対的に分類性能が低くなっており,改善の余地があった.本稿では,文間文脈を考慮可能な階層型 RNN をベースとするモデルを用いることで,日本国判決書における修辞役割分類の性能が従来の CRF による分類器に比べて向上することを示す.また,判決書中に出現する見出し情報を扱う専用のネットワークを階層型RNNに追加することで修辞役割類の性能が向上することを示す.

Recommended citation: 山田寛章, Simone Teufel, 徳永健伸. 見出し情報を考慮した階層型RNNによる日本語判決書のための修辞役割分類. 言語処理学会第26回年次大会発表論文集, pp. 37-40, 2020年3月. https://www.anlp.jp/proceedings/annual_meeting/2020/pdf_dir/P1-10.pdf

talks