CS 57100: Artificial Intelligence

Assignment 5: Logic-based Reasoning, Reinforcement Learning

Due 11:59pmEDT Tuesday, 15 November 2022

Please turn in a PDF through Gradescope. ITaP provides these instructions. Make sure that you mark the start/end of each question in Gradescope. Assignments will be graded based on what you mark as the start/end of each question. Please typeset your answers (LaTex/Word/OpenOffice/etc.)

1. Propositional Logic: Modeling

Consider representing satisfiability (SAT) problems as CSPs.

(2 points) Draw the constraint graph corresponding to the SAT problem
(¬X1 ∨ X2) ∧ (¬X2 ∨ X3) ∧ (¬X3 ∨ X4)
(2 points) How many solutions are there for this general SAT problem as a function of n? Explain.
(1 point) Give a knowledge base (e.g., values of X1, X2, ...) that entails a true outcome of this statement.
(1 point) Give a knowledge base (e.g., values of X1, X2, ...) that contradicts a true outcome of this statement.

2. Propositional Logic: Reasoning/Inference

[(Workout ⇒ Healthy) ∨ (Sleep ⇒ Healthy)] ⇒ [(Workout ∧ Sleep) ⇒ Healthy]

Given the above sentence, answer the following questions.

(2 points) Use enumeration to identify whether this statement is valid, satisfiable, or unsatisfiable.
(2 points) Convert the LHS and RHS of the main implication into Conjunctive Normal Form. Explain each step and describe how the results match your answer in 1.
(2 points) Use resolution to prove your answer in 1.

3. First-Order Logic: Modeling

In each of the following we give an English sentence and candidate logical expressions. For each of the logical expressions, state whether it (1) correctly expresses the English sentence; (2) is syntactically invalid and therefore meaningless; or (3) is syntactically valid but does not express the meaning of the English sentence.

If the logical expressions are invalid, identify the parts where they become syntactically invalid. If the logical expressions are valid but express different meanings, give English sentences of the logical expressions. Correct answers and explanation for each item will be worth 2 points.

All students have their phone or laptop.
- ∀x Student(x) ⇒ Has(x, Phone(x)∨Laptop(x))
- ∀x ¬Student(x) ∨ Has(x,Phone(x)) ∨ Has(x,laptop(x))
- ∀x Student(x) ∧ (Has(x,Phone(x)) ∨ Has(x,lapton(x)))
Some students who play games on one of their machines fail in a course.
- ∃x∃z Student(x) ∧ (∃y Machine(y,x) ∧ PlayGame(x,y)) ⇒ FailCourse(x,z)
- ∃x∃y∃z Student(x) ∧ Machine(y,x) ∧ PlayGame(x,y) ⇒ FailCourse(x,z)
- ∃x∃z Student(x) ∧ [∀y Machine(y,x) ⟺ PlayGame(x,y)] ⇒ FailCourse(x,z)

4. Power of First-Order Logic vs. Propositional Logic

First-order logic is supposed to give us greater capabilities than propositional logic.

(2 points) Name one feature of first-order logic that enables something that would be difficult or impossible to do with propositional logic, and explain why it is more difficult or impossible.
(2 points) Give an example of a real-world problem that you can model with first-order logic (and sketch how you would model it) and explain why it would be much more difficult to do with propositional logic than with first-order logic. (This may, but does not need to, use the feature you named in part 1.)

5. First-Order Logic: Reasoning/Inference

This question considers Horn KBs, such as the following:

P(F(x)) ⇒ P(x)
Q(x) ⇒ P(F(x))
P(A)
Q(B)

Let FC be a breadth-first forward-chaining algorithm that repeatedly adds all consequences of currently satisfied rules; let BC be a depth-first left-to-right backward-chaining algorithm that tries clauses in the order given in the KB. For the following five statements:

Determine if it is true / false (1 point each)
Explain why. Give a forward-chaining / backward-chaining proof if possible. (2 points each)

FC will infer the literal Q(A).
FC will infer the literal P(B).
If FC has failed to infer a given literal, then it is not entailed by the KB.
BC will return true given the query P(B).
If BC does not return true given a query literal, then it is not entailed by the KB.

6. Reinforcement Learning: Q-Learning

Q-Learning: Say, you have an urn of 2 balls. You can remove a ball or add a ball at a time, the only constraint being 0 ≤ size of urn ≤ 2; post any action you choose. Think this of as an underflow-overflow situation. We denote states as sizes of urn, i.e., 0,1,2. As a TD agent, show Q-Learning with α=0.1 and γ=0.9 . Include all steps. For each transition, current state, action taken, and Reward observed are mentioned. Assume Q(s,a)=0 for all configurations initially.

7. Bandit Problems

Consider the Uniform Bandit PAC Problem:

With probability 1-δ, and bounding the error by ε, and for k bandits, we get the equation:

How many examples per arm do you need to see, if R_MAX=1, δ=0.1, ε=0.01, and k=100 ?
What happens to the number of examples if R_MAX=1, δ=0.001, ε=0.01, and k=100 ? What do you notice with respect to part i? Reason about your observation.

8. Reinforcement Learning - Modeling

You've identified four different ways to get to class (e.g., two citybus routes and two different walking routes.) You are trying to find the most efficient way (fewest number of minutes) to get to class on time. Unfortunately, it isn't the same every day - sometimes you miss lights, it is crowded, buses are late, etc.

(3 points) Describe how you could think of this as a reinforcement learning problem if your goal was to minimize your total travel time over the course of the semester. What you'll need to provide depends on how you choose to model this, but likely includes states, actions, utility function, policy, is this active vs. passive learning, algorithms used, etc.
(1 point) Would your answer change if your goal was to make sure you got to class in time on exam days? If not, explain why your answer to part 1 also addresses this question. If you do think the answer is different, explain what is different.
(2 points) Do you think the semester is long enough for you to figure out the best route? Briefly explain how you derived this answer.