Using AI Tools for Research
Data Services Profile
We are here to help you find, use, manage, visualize and share your data. Contact us to schedule a consultation. View and register for upcoming workshops. Visit our website to learn more about our services.
What is AI?
Artificial Intelligence (AI) has been part of our digital world for decades, often working behind the scenes in technologies like search engines, email spam filters, and voice assistants. However, in recent years, AI has become much more visible—thanks to highly accessible tools like ChatGPT, DALL·E, and Grammarly, which can generate text, create images, assist with coding, or refine your writing. These tools are changing the way we interact with information and shaping how we learn, research, and create.
At its core, AI refers to systems that simulate human intelligence to perform tasks such as problem-solving, language understanding, learning from data, and pattern recognition. Some AI systems are rule-based and follow strict logic, while others, like machine learning models, "learn" from large amounts of data to improve performance over time.
According to UNESCO,
“Artificial intelligence systems refer to machine-based systems that can, for a given set of human-defined objectives, make predictions, recommendations or decisions influencing real or virtual environments. They are designed to operate with varying levels of autonomy.”
AI can be a powerful support tool —enhancing research, powering new forms of scholarship, and helping individuals work more efficiently. However, it also brings important questions about authorship, accuracy, equity, bias, and the ethics of using machine-generated content.
This guide is designed to help you:
-
Understand what AI is and how it works
-
Explore common AI tools you might encounter in your coursework or research
-
Learn about the opportunities and risks associated with AI use in education
-
Navigate responsible, ethical, and effective use of AI tools
Whether you’re curious about how AI might assist with writing, wondering if it’s okay to use an AI chatbot for brainstorming, or concerned about privacy and bias, this guide is here to support you as you explore and engage with this rapidly evolving technology.
Glossary
-
AI literacy: The ability to understand, use, and critically evaluate AI tools and technologies.
-
Artificial Intelligence (AI): Any method or technology that uses data to allow machines to autonomously mimic human cognitive functions such as learning from mistakes, making decisions, and communicating. Artificial Intelligence is a broad field, and powers a vast set of familiar features like finding the fastest route on Google Maps, speech recognition in Siri, and personalized recommendations for shows on Netflix.
-
Deep learning: Deep learning refers to a type of machine learning that utilizes an approach called a neural network. Neural networks are inspired by the structure of the human brain, and are composed of sets of nodes (or “neurons”) that are organized into layers. Each node is responsible for one aspect of the computational work necessary for the task that the model is trying to learn how to do. In the first layer, each node is randomly assigned a “weight.” After training data is run through the first layer of the neural network, the model evaluates how well it performed the task it was assigned to do, and adjusts the weights of each computational task accordingly in order to improve the prediction in the next layer. By running the training data through many layers of differently weighted nodes, the neural network calibrates itself to give more weight to computational tasks that help the model perform better depending on the input. Deep learning is so called because of the many, often hidden layers of nodes that can make up a neural network. Because of this, even some of the people who develop tools based on deep learning do not fully understand how they work, which is why you may have heard AI called a “black box.” Since AI tools are increasingly being used for decision-making, there has been a drive towards developing Explainable AI that utilizes transparent and understandable decision-making processes.
-
Generative AI: A subset of artificial intelligence technology that can generate new content such as text, images, code, music, and more based on patterns learned from existing data, in response to prompts from humans. Generative AI Tools include ChatGPT for text generation and Midjourney for image generation.
-
Hallucination: A hallucination is a piece of information provided by a Generative AI tool that is factually incorrect, irrelevant or nonsensical. Hallucinations can be produced because of a lack of relevant information in the training data, or because of limitations in the model architecture.
-
Large Language Model (LLM): Large language models form the foundation for Generative AI tools like ChatGPT. LLMs are developed by applying deep learning techniques to massive amounts of text data, with the goal of learning how to translate text prompts into a certain action (i.e. generating an image) and generating coherent text themselves.
-
Machine Learning: A branch of AI focused on teaching computers how to apply rules and patterns found in training data to make decisions about previously unseen data. An example of a machine learning model in action is the “People and Pets” section on iPhone photos, which groups images that the algorithm believes represent the same person or animal. The algorithm may have never seen the person or animal in its training data, but because it has “learned” from its training data what factors determine whether images represent the same thing, it can make decisions about new images.
-
Natural Language Processing (NLP): A subfield of machine learning that aims to teach computers to synthesize the meaning of human language and mimic human communication. Natural language processing algorithms are initially developed by feeding the computer a vast body of text, called a “corpus,” and teaching the computer how to identify aspects of the underlying structure of language in order to make meaning of the text. Generative AI tools like ChatGPT use natural language processing techniques to “understand” the prompts they are given, and to produce responses in a way that mimics human communication.
-
Prompt Engineering: The design of prompts that result in the desired output from a Generative AI tool. Because Generative AI tools are designed to guess the response you want based on the prompt you give them (whether the information they provide you is correct or not), they can benefit from prompt engineering techniques such as giving specific instructions, or building in some verification of the information they give you.
-
Training Data: All artificial intelligence tools and technologies require training data, which is the information that a model “gets to know” in order to “learn” how to mimic the human cognitive functions it needs to perform tasks. Training data can take the form of text, images, or structured data like spreadsheets, and be labeled or unlabeled.
In what is called supervised learning, training data is labeled by a human with the “correct” classification based on the task the model is being trained to perform. For example, a model developed to detect brain tumors in MRIs would be trained on a large set of MRI images, all labeled by a human with whether the MRI contains a tumor. The model uses the label to develop an approximation of what factors are statistically correlated with an MRI image containing a tumor, enabling it to apply this logic to other, unlabeled images that were not in the training data.
In what is called unsupervised learning, training data is not labeled, and the model uses calculations to map out the inherent structure of the data it is trained on, pulling out patterns and anomalies. For example, in a simple natural language model developed to write emails, a large body of email text would be provided as training data, and the model would conduct statistical calculations about what word is likely to come next in an email that mimics the ones found in the training data.