| Title | 
Authors | 
Topic | 
| Can Language Models Be Tricked by Language Illusions? Easier with Syntax, Harder with Semantics | 
Yuhan Zhang, Edward Gibson and Forrest Davis | 
Computational Psycholinguistics, Cognition and Linguistics | 
| ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind | 
Xiaomeng Ma, Lingyu Gao and Qihui Xu | 
Computational Psycholinguistics, Cognition and Linguistics | 
| The Zipfian Challenge: Learning the statistical fingerprint of natural languages | 
Christian Bentz | 
Computational Psycholinguistics, Cognition and Linguistics | 
| On the Effects of Structural Modeling for Neural Semantic Parsing | 
Xiang Zhang, Shizhu He, Kang Liu and Jun Zhao | 
Lexical, Compositional and Discourse Semantics | 
| Title | 
Authors | 
Topic | 
| The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks | 
Kaiser Sun, Adina Williams and Dieuwke Hupkes | 
Theoretical Analysis and Interpretation of ML Models for NLP | 
| Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning | 
Lucas Weber, Elia Bruni and Dieuwke Hupkes | 
Theoretical Analysis and Interpretation of ML Models for NLP | 
| Med-HALT: Medical Domain Hallucination Test for Large Language Models | 
Ankit pal, Logesh Kumar Umapathi and Malaikannan Sankarasubbu | 
Resources and Tools for Scientifically Motivated Research | 
| Revising with a Backward Glance: Regressions and Skips during Reading as Cognitive Signals for Revision Policies in Incremental Processing | 
Brielen Madureira, Pelin Çelikkol and David Schlangen | 
Theoretical Analysis and Interpretation of ML Models for NLP | 
| Title | 
Authors | 
Topic | 
| ChiSCor: A Corpus of Freely-Told Fantasy Stories by Dutch Children for Computational Linguistics and Cognitive Science | 
Bram van Dijk, Max van Duijn, Suzan Verberne and Marco Spruit | 
Resources and Tools for Scientifically Motivated Research | 
| HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities | 
Esra Dönmez, Pascal Tilli, Hsiu-Yu Yang, Ngoc Thang Vu and Carina Silberer | 
Interaction and Grounded Language Learning | 
| Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests | 
Max van Duijn, Bram van Dijk, Tom Kouwenhoven, Werner de Valk, Marco Spruit and Peter vander Putten | 
Computational Psycholinguistics, Cognition and Linguistics | 
| A Block Metropolis-Hastings Sampler for Controllable Energy-based Text Generation | 
Jarad Forristal, Fatemehsadat Mireshghallah, Greg Durrett and Taylor Berg-Kirkpatrick  | 
Natural Language Generation | 
| Title | 
Authors | 
| Not all layers are equally as important: Every Layer Counts BERT | 
Lucas Georges Gabriel Charpentier and David Samuel  | 
| Towards more Human-like Language Models based on Contextualizer Pretraining Strategy | 
Chenghao Xiao, G Thomas Hudson and Noura Al Moubayed | 
| Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures | 
Julius Steuer, Marius Mosbach and Dietrich Klakow | 
| CLIMB – Curriculum Learning for Infant-inspired Model Building | 
Richard Diehl Martinez, Hope McGovern, Zebulon Goriely, Christopher Davis, Andrew Caines, Paula Buttery and Lisa Beinborn | 
| Title | 
Authors | 
| Humans and language models diverge when predicting repeating text | 
Aditya Vaidya, Javier Turek and Alexander Huth | 
| Investigating the Nature of Disagreements on Mid-Scale Ratings: A Case Study on the Abstractness-Concreteness Continuum | 
Urban Knuples, Diego Frassinelli and Sabine Schulte im Walde | 
| ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages | 
Mohammad Akbari, Saeed Ranjbar Alvar, Behnam Kamranian, Amin Banitalebi-Dehkordi and Yong Zhang | 
| A Comparative Study on Textual Saliency of Styles from Eye Tracking, Annotations, and Language Models | 
Karin de Langis and Dongyeop Kang | 
| PROPRES: Investigating the Projectivity of Presupposition with Various Triggers and Environments | 
Daiki Asami and Saku Sugawara | 
| A Minimal Approach for Natural Language Action Space in Text-based Games | 
Dongwon Ryu, Meng Fang, Gholamreza Haffari, Shirui Pan and Ehsan Shareghi | 
| Structural Ambiguity and its Disambiguation in Language Model Based Parsers: the Case of Dutch Clause Relativization | 
Gijs Wijnholds and Michael Moortgat | 
| Quirk or Palmer: A Comparative Study of Modal Verb Frameworks with Annotated Datasets | 
Risako Owan, Maria Gini and Dongyeop Kang | 
| Quantifying Information of Tokens for Simple and Flexible Simultaneous Machine Translation | 
DongHyun Lee, Minkyung Park and Byung-Jun Lee | 
| Enhancing Code-mixed Text Generation Using Synthetic Data Filtering in Neural Machine Translation | 
Dama Sravani and Radhika Mamidi | 
| Towards Better Evaluation of Instruction-Following: A Case-Study in Summarization | 
Ondrej Skopek, Rahul Aralikatte, Sian Gooding and Victor Carbune | 
| Syntactic Inductive Bias in Transformer Language Models: Especially Helpful for Low-Resource Languages? | 
Luke Gessler and Nathan Schneider | 
| Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in Dialogue | 
Aron Molnar, Jaap Jumelet, Mario Giulianelli and Arabella Sinclair | 
| On the utility of enhancing BERT syntactic bias with Token Reordering Pretraining | 
Yassir El Mesbahi, Atif Mahmud, Abbas Ghaddar, Mehdi Rezagholizadeh, Phillippe Langlais and Prasanna Parthasarathi | 
| Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty | 
Inar Timiryasov and Jean-Loup Tastet | 
| BabyLM Challenge: Curriculum learning based on sentence complexity approximating language acquisition  | 
Miyu Oba, Akari Haga, Akiyo Fukatsu andYohei Oseki  | 
| Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior?
 | 
Aryaman Chobey, Oliver Smith, Anzi Wang and Grusha Prasad | 
| CogMemLM: Human-Like Memory Mechanisms Improve Performance and Cognitive Plausibility of LLMs | 
Lukas Thoma, Ivonne Weyers, Erion Çano, Stefan Schweter, Jutta L Mueller and Benjamin Roth | 
| McGill BabyLM Shared Task Submission: The Effects of Data Formatting and Structural Biases  | 
Ziling Cheng, Rahul Aralikatte, Ian Porada, Cesare Spinoso-Di Piano and Jackie CK Cheung | 
| On the effect of curriculum learning with developmental data for grammar acquisition  | 
Mattia Opper, J Morrison and Siddharth N | 
| ToddlerBERTa: Exploiting BabyBERTa for Grammar Learning and Language Understanding  | 
Ömer Veysel Çağatan | 
| Title | 
Authors | 
| How Fragile is Relation Extraction under Entity Replacements? | 
Yiwei Wang, Bryan Hooi, Fei Wang, Yujun Cai, Yuxuan Liang, Wenxuan Zhou, Jing Tang, Manjuan Duan and Muhao Chen | 
| JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models | 
Yuiga Wada, Kanta Kaneda and Komei Sugiura | 
| MuLER: Detailed and Scalable Reference-based Evaluation | 
Taelin Karidi, Leshem Choshen, Gal Patel and Omri Abend | 
| The Impact of Familiarity on Naming Variation: A Study on Object Naming in Mandarin Chinese | 
Yunke He, Xixian Liao, Jialing Liang and Gemma Boleda | 
| PSST! Prosodic Speech Segmentation with Transformers | 
Nathan Roll, Calbert Graham and Simon Todd | 
| Alignment via Mutual Information | 
Shinjini Ghosh, Yoon Kim, Ramon Fernandez Astudillo, Tahira Naseem and Jacob Andreas | 
| Challenging the "One Single Vector per Token" Assumption | 
Mathieu Dehouck | 
| Strategies to Improve Low-Resource Agglutinative Languages Morphological Inflection | 
Gulinigeer Abudouwaili, Wayit Ablez, Kahaerjiang Abiderexiti, Aishan Wumaier and Nian Yi | 
| Exploring Transformers as Compact, Data-efficient Language Models | 
Clayton Fields and Casey Kennington | 
| Tree-shape Uncertainty for Analyzing the Inherent Branching Bias of Unsupervised Parsing Models | 
Taiga Ishii and Yusuke Miyao | 
| Future Lens: Anticipating Subsequent Tokens from a Single Hidden State | 
Koyena Pal, Jiuding Sun, Andrew Yuan, Byron Wallace and David Bau | 
| Cross-Document Event Coreference Resolution: Instruct Humans or Instruct GPT? | 
Jin Zhao, Nianwen Xue and Bonan Min | 
| Implications of Annotation Artifacts in Edge Probing Test Datasets | 
Sagnik Ray Choudhury and Jushaan Kalra | 
| REFER: An End-to-end Rationale Extraction Framework for Explanation Regularization | 
MohammadReza GhasemiMadani and Pasquale Minervini | 
| A surprisal oracle for active curriculum language modeling  | 
Xudong Hong, Sharid Loáiciga and Asad B. Sayeed | 
| Baby’s CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models  | 
Zheyu Zhang, Han Yang, Bolei Ma, David Rügamer and Ercong Nie | 
| Byte-ranked Curriculum Learning for BabyLM Strict-small Shared Task 2023 | 
Justin DeBenedetto | 
| ChapGTP, ILLC’s Attempt at Raising a BabyLM: Improving Data Efficiency by Automatic Task Formation | 
Jaap Jumelet, Michael Hanna, Marianne De Heer Kloots, Anna Langedijk, Charlotte Pouw and Oskar van der Wal | 
| GPT-wee: How Small Can a Small Language Model Really Get?  | 
Bastian Bunzeck and Sina Zarrieß | 
| Mean BERTs make erratic language teachers: the effectiveness of latent bootstrapping in low-resource settings  | 
David Samuel | 
| Tiny Language Models Enriched with Multimodal Knowledge from Multiplex Networks | 
Clayton Fields, Osama Natouf, Andrew McMains, Catherine Henry and Casey Kennington |