Final Project Ideas

The primary purpose of the final project is to allow students to pursue ideas that they are curious about, and to build on the homework and lectures. We have suggested several final project ideas below. If you are interested in an idea but unsure exactly how to proceed your group can get in touch with the professor or TAs to discuss.

The key is that the project cannot be a simple ML/AI project and must touch on something related to cognition and cognitive science (that is to say modeling human behavior or mental representations/processes).

Projects that appear promising at the end of the class may be invited to extend to a full conference paper. Working toward a publication is not a requirement for a good grade but is a possibility if your group is up for it. Several past projects have been published in conferences with additional work done by the students and faculty mentors after the class.

Flexible and Abstract Reasoning in Humans and Machines

The Abstraction and Reasoning Corpus (ARC) is a visual program-synthesis benchmark designed to evaluate broad generalization in machines. The benchmark was released along with a preprint in 2019 by François Chollet (Chollet 2019). Unlike many other benchmarks in the machine learning world, this benchmark has seen little progress since its release and has proved to be very difficult for current AI. Recent results (LeGris et al. 2024) have demonstrated that human performance is far exceeding state-of-the-art models. Moreover, to spur progress towards achieving human performance, an ARC Prize was founded this year, offering a million dollars in prize money. Check out this website to learn more about a recent human dataset on ARC called H-ARC.

Given the nature of the tasks in ARC, multiple aspects of people's cognitive mechanisms can be examined. Here are a few preliminary ideas:

Hypothesis generation: H-ARC contains free form natural language descriptions of people's solutions to ARC problems. Recent work (Wang et al. 2023) has examined to what extent a hypothesis-generation model using an LLM can solve ARC tasks. In this work, the authors demonstrate that one of the potential bottlenecks for SOTA LLMs like GPT-4 in solving ARC tasks is the inability to generate correct natural language hypotheses. The ability to come up with high-level natural language goals and descriptions of ARC grids seems likely to underlie people's ability to solve ARC problems.
- Project 1: Finetuning a small LLM model (Llama / Gemma / Mistral family) to improve hypothesis-generation using H-ARC data. The main idea in this project is to reproduce the results from (Wang et al. 2023) and try to improve the model's performance by improving its ability to generate plausible natural language hypotheses through fine-tuning.
- Project 2: Previous work (Vong and Lake, 2023 ; Bramley et al. 2018) has suggested that bottom-up feature- based processes may be involved in generating hypotheses. Preliminary results suggest that LLM performance increases when provided with natural language descriptions of ARC grids. Current SOTA vision LLMs are not very good at simply describing the grids since they are very far removed from naturalistic image distributions. Conversely, people use all sorts of interesting abstractions to describe the grids in their natural language solutions. The main idea in this project would be to try and improve / fine-tune a VLM or simple captioning model to give more human-like descriptions of ARC grids. The LARC dataset could also be useful here.
Action traces: H-ARC contains step-by-step action traces of people's generation process. This data is likely to be revealing of people's underlying mental representation when solving ARC tasks.
- Project 1: Representing knowledge and tasks to accomplish goals at multiple levels of abstractions (goals, subgoals, low-level actions), is an interesting feature of human cognition and is likely to explain how people can do complex planning (Ho et al. 2019). One way to think about solving ARC problems is as a decomposition problem: how can one decompose the output into the steps needed to transform the input grids into their corresponding outputs? "Options" are a key idea from the RL literature that map naturally to this idea of abstractions and hierarchical plans (Sutton et al. 1999; Stolle and Precup 2002). Looking at people's action traces for each problem in H-ARC, there is likely to be structure in how people generate output grids. One way to analyze this consists in looking at "bottleneck states", states that are visited by multiple participants when solving a given task. The main idea in this project is to try and develop an RL model of people's action traces. Some potential ideas:
  - Perform imitation learning with the human data and compare to humans on held-out set. One way to do this could be to train a decision transformer (Chen et al. 2021) on the action traces from humans and compare to humans on a held-out set of ARC tasks.
  - Train an RL agent to solve some ARC tasks (this might be useful https://github.com/ConfeitoHS/arcle) and compare to people.

Intuitive Physics

Intuitive physics refers to our ability to predict the behavior of physical objects in the world. This ability is crucial for interaction with our environment. It remains a unique challenge for machines (e.g., driverless cars, robots, etc...) to understand and predict the physical world. Here are a few ideas for final projects in this area:

Explaining how people reason about (physical) tools: The virtual tools game is a task, dataset, and model of how humans quickly learn to solve physical reasoning problems. The task involves manipulating and choosing tools to create a desired endstate of a system (e.g., getting a ball into a bucket). The task is designed to be a simple, yet challenging, test of physical reasoning. The dataset is available for download along with the code for several models. Their model suggests that people explore using an “object-based prior” for where tools could be placed near an object, simulate the effects, and update their sampling distribution from what they’ve learned. One final project idea could be extending their model to include other ways of sampling, simulating, or updating, based on ideas we’ve covered in class or your own intuition about how people might be performing the task. Be creative!
Explaining errors in human physical reasoning - Athough humans are quite good at reasoning about the physical world, we make many systematic errors (Ludwin-Peery, Bramley, Davis, Gureckis, 2021). Recently there has been some ideas in the literature that the human mind uses mental simulation akin to the operation of a video game engine (Battaglia, Hamrick, Tenenbaum,2013). However, more recent refinements of this model have proposed the idea of partial simulation (Bass, Smith, Bonawitz, & Ullman, 2022). That is, people might not simulate the entire physical world but only the parts that are relevant to the task at hand. This project might explore the idea of partial simulation in the context of physical reasoning.
- One way to do this might be to explore the consequences of partial simulation in the virtual tools game. Could a partial simulation model better explain those results?
- Another might be to set up a phyical simulation to see if you can replicate and extend the results of Bass et al. (The model makes several surprising predictions in their experiments!)
- Another might be to start with the empirical findings in Ludwin-Peery et al. and try to build a computational model that can explain the results. This might involve a model that combines partial simulation with other ideas from the class.

Understanding predictions about physical stability and safety - A very popular domain for studying intuitive physical reasoning is in the so-called "block towers" task (e.g., Lerer, Gross, and Fergus, 2016). In this experiment, humans (or machines) are shown a picture of a stacked set of blocks and need to judge if the structure is "stable" or "likely to fall over." Convolutional neural networks have shown some success at predicting stability in these tasks (e.g., Lerer paper) and human judgements in this task have been studied by psychologists in many experiments (e.g., Battaglia, Hamrick, Tenenbaum,2013). The specific judgement about towers is indicative of many situations where we have to judge physical stability. For example, every year hundreds of thousands of people are injured at work from falling or being hit by objects (e.g., see National Safety Council report), and one potential cause might be errors in physical reasoning. One project would be to create a new task domain of images of stability that are like the block towers but are more realistic to safety events in everyday life (e.g., ladders, steps, stacks of plates, placement of scissors/knifes, etc...). You could then collect human judgements about these images and try to build a model that can predict these judgements. This project would be a good way to explore the relationship between physical reasoning and safety judgements.

Real-world physical reasoning errors

Most studies of human cognition rely on carefully designed but artificial tasks (e.g., memory games, IQ tests). However, such approaches sometimes lack applicability to the "real world." One approach is to study cognition outside the laboratory -- studying cognition “in the wild”. One example of a difficult but important reasoning problem that thousands of people in New York City face every day is how to securely lock their bikes. Bike racks come in various shapes, sometimes none are available and one has to make do with a lamp post or scaffolding, and there are safer and less safe ways to use whatever locks one has at hand (ideally, both wheels and the frame are secured). Do people use their locks optimally? Do they exert effort “rationally” (so that their bikes are closer to optimally locked in places where they’re are more likely to be stolen)? The gureckis lab has a dataset of images of locked bikes from around NYU, and this project could try to answer these and other questions about how people reason in this real-world setting. The project in this case would be analyzing the dataset and perhaps trying to apply models to detect when errors are made in locking. If your group is interested in this project reach out to Professor Gureckis and he'll put you in touch with a PhD student who has the data.

Question Asking: Can LLMs ask good questions?

Question asking is an important way that humans learn about the world around them. Asking a good question balances several factors such as potential information, complexity of the question, and the cost of asking. Clearly LLMs can be prompted to ask questions but it is unknown how "good" those questions are. Interestingly, cognitive scientists have attempted to consider this question on the human side -- do humans ask good questions? Rothe, Lake, Gureckis, 2018. This project would try to set up an information seeking task that both humans and LLMs can reasonable perform and then use measure from information theory (including expected information gain) to see if LLMs ask questions in ways that strategically acquire information. A bonus element would be to see if open-source LLMs can be fine-tuned to ask better questions.

LLM Information-seeking and reasoning behavior in games

Humans are capable of playing games that involve information seeking, such as Mastermind or 20 questions. Prior work has established that people don't necessarily ask optimal question in an information-theoretic sense, but focus on acquiring easily interpretible information (Cheyette et al., 2023). Other prior work has explored a variation called Entropy Mastermind, introduced by Schulz et al. and see also Taylor et al. (2020). From some very preliminary examinations, LLMs don't excel at this game, both genrating suboptimal queries (compared to, say, an ideal Bayesian model) and making reasoning mistakes (failing to correctly assimilate information). This project would study this more rigorously, by prompting LLMs to play Mastermind, analyzing their mistakes, comparing them to human behavior and ideal algorithms, and optionally, trying to fine-tune or prompt LLMs to improve their ability to play these games.

Representing Goals as Programs for Goal-Conditioned Reinforcement Learning

A line of work from the lab has examined how people represent and generate goals, proposing that we use program-like representations in a langauge of thought, and that such representations might solve some shortcomings in goal representations for reinforcement learning. (Davidson et al., 2022; Davidson et al., 2024; Davidson & Gureckis, 2024). So far in the lab we've studied this in people and built computational models of how people generate goals; we haven't yet trained RL agents to pursue goal programs. This project would take steps in that direction. Pick a reinforcement learning task with sufficient complexity for program goals to be interesting to study (e.g., something like Crafter/Craftax, the NetHack Learning Environment, or perhaps Minecraft/MineRL), create a distribution of goal programs and a way to evaluate/interpret them in the context of the game, train RL agents to pursue these goals, and study their behavior.

Helping is a universal human behavior and is a core aspect of a functioning society. However, the decision to provide help and what type of help to provide is a complex cognitive calculation that simultaneously weights many costs and benefits.

Recently Osborn Popp & Gureckis (2024) reported a large experiment where two people played a game online where it was possible to altruistically help the other play accomplish their task. Helping decisions in the task likely rely on a combination of local heuristics and strategic cost-benefit analyses. Models that would be useful to explore in this game would be those involving elements of multistep planning, theory of mind, and reinforcement learning. The data are availble for modeling along with some preliminary model fitting code but the details of models would need to be articulated for a successful final project. If your group is interested in this project reach out to Professor Gureckis and he'll put you in touch with a PhD student who has the data.

Language and concept learning: Bridging symbolic and neural network models

In our lecture on probabilistic programs we discussed how people might learn concepts by using a kind of "Bayesian program induction" where they learn a program that can generate the data they see (e.g., Goodman et al., 2008 ). This is a kind of "symbolic" model of concept learning. However, many recent advances in AI have been driven by neural network models which learn to predict data directly from input-output pairs without symbolic intermediate representations. One interesting project would be to try and bridge these two approaches. For example, you could train a neural network (e.g., a basic transformer model) to predict the output of a symbolic generative model (essentially sampling 'data' from the Bayesian generative model and using it as training data for a more flexible transformer model). The ability of the transformer model to learn the structure of the Bayesian model, and then to further learn from humans could highlight how these two approaches might be combined.

Another approach would be to leverage the language abilities of LLMs to try to learn the rule-based categories that people learn. For example, instead of sampling from a grammar (as in Goodman et al. 2008) you could try to sample rules from a language model and then translate them into code which can be run (similar to solutions to the ARC prize entries). Does a model like this capture the learning patterns of people described by Goodman et al. 2008? Could the model make novel predictions for a new task which could be tested in a human experiment? For example, Vong and Lake, 2022 propose a model of few-shot rule learning which doesn't rely on a pre-specified domain language (like the one in Goodman et al. 2008). Could a model like this be used to learn concepts in a way similar to humans?

Planning and neural networks

As mentioned in the lecture, planning is an key aspect of human cognition, particularly in domains like games where the branching structure of the statespace requires thinking multiple steps ahead. Planning models propose that people sequentially enumerate states and step through them. An alternative approach would use flexible neural network architectures which plan in a single pass through a network. In this project you would implement a planning algorithm (e.g., MCTS) in a cognitively interesting task (E.g., the 4-in-a-row task from Ma et al.). Then you would implement a neural network model (E.g., transformer) which is trained to mimic the behavior of the planning algorithm. You could then compare the performance of the two models on a set of held-out tasks. This project would be a good way to explore the relationship between planning and neural network models in a cognitively interesting domain. Key questions would be:

How does the performance of the neural network model compare to the planning model?
How does the performance of the neural network model compare to people?
What are the limitations of the neural network model compared to the planning model?
What data would be most useful for telling the models apart?

Another approach is to explore the computational implications of novel methods for combinging planning and neural networks (e.g., MCTSnet Guez et al., 2018). Could a model like this be used to learn planning strategies in a way similar to humans?

Language, Action, and Reinforcement Learning

Humans learn by directly interacting with the environment (as in reinforcement learning -- experiencing rewards after making decision), but we also learn indirectly by communicating with other people — most notably, via language. How do people combine experience and language to learn? Here are a few ideas for final projects in this area:

Combining language and experience using Bayesian models: One way to combine language and refinrocement learning is through a Bayesian approach that combines both sources of information. For example, one recent abstract modeled the way that "verbal hints" about a task can help improve RL learning. The model works by assuming a Bayesian prior over programs and combines that with Bayesian RL (Ho & Gureckis, 2023). This work was highly preliminary and so extending this model to account for more complex tasks would be a good final project. One issue is that in the current task the tasks were very simple and so the hints often gave away the answer. A more interesting task would be to see if the model could learn to solve more complex tasks with more subtle hints. A related idea is to see how this Bayesian approach compares to approaches using neural networks. For example, instruction following neural networks use seq-2-seq models to learn to follow instructions in a grid world. How do these models compare to the Bayesian approach?
Language and exploration: As mentioned in the lecture, many approaches designed to encourage exploration in RL depend on "exploration bonuses". Some of these might from from intrinsic rewards like curiosity. One interesting idea would be if you can use language to help exploration. For example, one recent paper (Mu et al., 2022) describes an algorithm which uses a language model to effectively "caption" novel states and then explores over novel descriptions. Some related work has looked at how for humans RL tasks states which are easy to "caption"/"label" are learned more easily (Radulescu, Vong, & Gureckis). This project could explore via simulation how "labeling" state with language might aid exploration and learning in RL tasks.
Communicating action policies: People don't simply take in language (e.g., during instruction following). They can also describe, using words, the policies they are currently following or have learned via RL. One interesting idea is to explore how people use words to describe a policy they are following in a RL tasks. Those same descriptions can then be given to other participants who can then try to follow the same policy. One interesting project would be to see if you can build a model that can learn to communicate policies in a simple RL task and then see if other participants can follow those instructions.

Reinforcement learning and Deep Q-Learning

Extend the homework to combine RL with convolutional nets. In the RL homework you will use the Open AI gym to solve some dynamic control problems from simplified featural representations of the current world state. However, recent advances in Deep RL allow you use to use raw pixel inputs as features. One project idea would be to extend the approach in the homework to model learning from the raw pixels of the images. If you approach that, it is important to maintain a human comparison or element to your project. For instance, one interesting psychological aspect is to consider how if you alter or obscure parts of the game, it makes it easier or harder for people (while not changing the difficulty for your RL agent) as in the Dubey et al. (preprint) below. To do this might require a little bit of hacking of the OpenAI environment and also asking your friends to play a few weird video games to measure their performance.

References

The OpenAI Gym: https://gym.openai.com/envs/#atari

Dubey, R., Agrawal, P., Pathak, D., Griffiths, T. L., & Efros, A. A. (preprint). Investigating human priors for playing video games. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018). https://rach0012.github.io/humanRL_website/ (paper and project website)

Neural networks - Memory

Implement the IAC model using the equations from the original paper for the Jets/Sharks example from Homework 1 in Python with a visualization tool comparable to the Java runtime one we used in class. Open source/document the github code for this in a way that would be useful for other researchers. The code could run in a Jupyter notebook or just use Jupyter as a front end.
Extending the Interactive activation model to a new domain. Implement the Interactive Activation (IAC) Model (McClelland, 1981) from scratch and study a domain of your choice, replacing the characters from the West Side Story with different items. That is, rather than using a network to encode the people and properties from the “Jets and Sharks” example (as in Homework 1 - Part A), create a network that encodes information about a different domain of your choice (items and properties). Study the phenomena covered in class, such as content addressability, graceful degradation, spontaneous generalization, and other properties etc. in your new domain. Study how the parameters of the IAC model affect the results.

References:
McClelland, J. L. (1981). Retrieving general and specific information from stored knowledge of specifics. In Proceedings of the third annual meeting of the cognitive science society.

Probabilistic graphical models - Memory

Interactive activation model as Bayesian inference. Study the interactive activation model reimagined as a Bayesian network rather than a neural network (a directed graphical model). In particular, use a “naive Bayesian classifier,” using the hidden/instance value as the latent “class” and the properties such as name, age, occupation, etc. as a vector of “observations.” Encode the Jets and Sharks example from Homework 1 (Part A) inside the Bayesian network, add noise to the conditional distributions, and then study phenomena discussed in class such as content addressability, graceful degradation, spontaneous generalization, etc. in this probabilistic model. Discuss the relationship between the interactive activation model and probabilistic modeling, as outlined in McClelland (2013).

References:
McClelland, J. L. (1981). Retrieving general and specific information from stored knowledge of specifics. In Proceedings of the third annual meeting of the cognitive science society. McClelland, J. L. (2013). Integrating probabilistic models of perception and interactive neural networks: a historical and tutorial review. Frontiers in psychology, 4, 503.

Neural networks - Semantic Cognition

Extending the semantic cognition model to a new domain. Extend the Rogers and McClelland (2003) model of semantic cognition (Homework 1 - Part C) to a much larger dataset of semantic knowledge about objects and their properties, or to a new domain all together. For instance, you could train the network on hundreds of objects and their properties). Study the dynamics of differentiation in development (Lecture 2 Slide, Slide) or degradation when noise is added (Lecture 2 Slide).
Question answering for semantic cognition. Reimagine the Rogers and McClelland network for semantic cognition (Homework 1 - Part C) using a more contemporary neural network architecture for Question Answering. Rather than taking an “Item” layer and “Relation” layer as separate inputs and producing all of the appropriate properties on the “Attributes” layer, you would use a recurrent neural network (RNN) for question answering. This alternative architecture could simply take a yes/no question in natural language as an input, such as “Can a canary sing?”, encode the question as a vector with a RNN, and produce an answer using a single binary output unit (simply “Yes” vs. “No”). You could train this model to learn all of the same facts as the Rogers and McClelland model, or on a different set of facts. Study the dynamics of differentiation in development (Lecture 2 Slide, Slide) or degradation when noise is added (Lecture 2 Slide).

References
McClelland, J. L., & Rogers, T. T. (2003). The parallel distributed processing approach to semantic cognition. Nature Reviews Neuroscience, 4(4), 310.

Neural networks - Language

Large-scale learning of lexical classes. Can you discover lexical classes with a large-scale recurrent neural network (RNN)? Train a more contemporary recurrent neural network architecture (such as a LSTM) on a next word prediction task given a sizeable corpus of text, including thousands of sentences. Can you replicate some of the results from Elman (1990) on a bigger corpus, especially the hierarchical clustering results for discovering lexical categories (Lecture 3 Slide)?
Exploring lexical and grammatical structure in BERT or GPT2. What do powerful pre-trained models learn about lexical and grammatical structure? Explore the learned representations of a state-of-the-art language model (BERT, GPT2, etc.) in systematic ways, and discuss the learned representation in relation to how children may acquire this structure through learning.

References
Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179-211. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Radford et al., (2019). Language Models are Unsupervised Multitask Learners. arXiv preprint.

Bayesian modeling / Probabilistic programming - Number game

Probabilistic programming and the number game. In your Homework 3 (Part A), you explored a Bayesian model of concept learning with the number game. A potential weakness of this model is that the hypotheses need to be explicitly defined and enumerated as a list. Can you devise a more compact language or grammar for defining the space of possible number concepts? It may be convenient to define the prior distribution as a probabilistic program. Feel free to change the hypothesis space -- in that the prior defined by the probabilistic program may include some or most of the current hypotheses, but also others. How does the model performance change as you change the hypothesis space? Does this new prior help us better understand where people’s prior may come from?

References
Gerstenberg, T., & Goodman, N. (2012). Ping Pong in Church: Productive use of concepts in human probabilistic inference. In Proceedings of the Annual Meeting of the Cognitive Science Society.

Bayesian modeling -- Categorical perception

Applying the perceptual magnet model to a new domain. Pick a new domain to apply the Bayesian account of the perceptual magnet effect to, such as objects, image data, or audio data. Collect behavioral judgments about the discrimination between pairs of stimuli, or through similarity ratings between pairs of stimuli. Can you fit the Bayesian model to explain the behavioral data?

References
Feldman, N. H., & Griffiths, T. L. (2007). A rational account of the perceptual magnet effect. In Proceedings of the Annual Meeting of the Cognitive Science Society. (http://ling.umd.edu/~nhf/papers/PerceptualMagnet.pdf)

Decision Making

Modeling human decision making - The choice prediction task is a Kaggle-style challenge problem where in human participants were given a huge number of choices between gambles. The space of the gambles aims to expose many of the features of human decision making irrationalities. Each year there is a competition built around these problems along with a number of models which have been determined to be the “winner.” You can attempt to your use data science and cognitive modeling skills to enter a new model into this competition (or replicating and existing one), evaluating its performance according to the same measures used in the original choice prediction challenge.
Generative models of tasks - One feature that enabled research on the CPC is that it is easy to generate new gambles with various properties. When researchers can generate a family of tasks then it is possible to widely sample in that task space comparing humans and algorithsm. One interesting project would be to try to create a generative model for a new task domain. For example, Icarte, et al. (2022) propose the idea of a "reward machine" which is a finante state machine for specifying rewards in simple MDPs. Could you use a reward machine to generate a family of tasks for a new domain such as for bandit tasks, grid world tasks or other tasks that might be interesting to study in the context of human decision making? Even if it isn't possible to run a full human study elaborating the space of tasks expressable in the generative model would be a valuable contribution (see this paper for instance).

References
The Choice Prediction Challenge website: https://cpc-18.com

Plonsky, Ori and Erev, Ido and Hazan, Tamir and Tennenholtz, Moshe, Psychological Forest: Predicting Human Behavior (May 19, 2016). Available at SSRN: https://ssrn.com/abstract=2816450 or http://dx.doi.org/10.2139/ssrn.2816450

Joshua Peterson, David Bourgin, Mayank Agrawal, Daniel Reichman, Thomas Griffiths (2021). Using large-scale experiments and machine learning to discover theories of human decision-making. Science. https://www.science.org/doi/abs/10.1126/science.abe2629

Categorization and Category Learning

Contribute to open science! As mentioned in the lecture on categorization there are a variety of different theories of how people learn categories and concepts from examples, and many of these models draw from similar approaches to machine classification. Currently there is a community-led effort to implement all existing category learning models in R so they can be simultaneously compared to the same sets of human data patterns. While the project is well developed and documented there are many example models which have not yet been implement (some of which are actually somewhat easy). One nice final project would be to read one of the papers listed below which describe a famous formal model of human categorization and implement it for submission to the catlearn R package. This is best for a group with some expertise in R (as opposed to python). If your group makes a report showing how this model does on the existing data set in catlearn it would make a nice final paper and by making a pull request against the catlearn package your work for class might live forever to help advance science! You could also choose to implement these models in python in which case you could still make an impact by helping to verify the previous results (see below).

References:

Catlearn R pacakage: https://ajwills72.github.io/catlearn/

Examples:

RULEX is a simple type of decision tree algorithm - Nosofsky, R. M., Palmeri, T. J., & McKinley, S. C. (1994). Rule-plus-exception model of classification learning. Psychological Review, 101(1), 53-79.here
ATRIUM is a hybrid rule and nearest neighbor/exemplar algorithm that is trained used backpropogation and gradient descent - Erickson, M. A., & Kruschke, J. K. (1998). Rules and exemplars in category learning. Journal of Experimental Psychology: General, 127(2), 107-140. here

Replicate and verify!

Psychology is largely an empirical science and thus findings need to be independently replicated before they should be widely accepted. This is true for computational cognitive modeling as well. A line of interesting final projects would be to pick a recent computational modeling paper and to re-implement and verify the results reported by the authors. In doing this exercise you might come up with your own ideas about a feature to alter or change in their simulations that might be interesting. A couple good sources for papers would be to look at the titles from the most recent Proceedings of the Cognitive Science society or a new journal called Computational Brain and Behavior.

References

Final Project Ideas ​

Flexible and Abstract Reasoning in Humans and Machines ​

Intuitive Physics ​

Real-world physical reasoning errors ​

Question Asking: Can LLMs ask good questions? ​

LLM Information-seeking and reasoning behavior in games ​

Representing Goals as Programs for Goal-Conditioned Reinforcement Learning ​

Social Decision Making and Helping Behavior ​

Language and concept learning: Bridging symbolic and neural network models ​

Planning and neural networks ​

Language, Action, and Reinforcement Learning ​

Reinforcement learning and Deep Q-Learning ​

Neural networks - Memory ​

Probabilistic graphical models - Memory ​

Neural networks - Semantic Cognition ​

Neural networks - Language ​

Bayesian modeling / Probabilistic programming - Number game ​

Bayesian modeling -- Categorical perception ​

Decision Making ​

Categorization and Category Learning ​

Replicate and verify! ​