IFB105 Database Management — Complete Study Guide

Built from your course materials. Written to teach you from scratch. Read it top to bottom once, then use the cheat sheets for fast revision.

The exam is open book. You can bring unlimited paper notes (no digital). It is 2 hours, a mix of multiple choice (Section A, 15 × 1 mark, on a bubble sheet in pen) and short answer (Section B — writing SQL, drawing ORM, doing RMaps and normalisation, in the boxes on the paper). It covers the whole semester. Steps 6–7 of the CSDP are not examined.

PART 0 — The big picture (how the course fits together)

A database stores facts about the real world. This unit teaches you the full pipeline for designing one properly:

Understand the domain → write it as plain-English elementary facts (ORM Step 1).
Model it → draw an ORM diagram (conceptual schema) using the 7-step CSDP.
Convert the model to tables → Relational Mapping (RMap) turns the ORM diagram into table definitions.
Check the tables are clean → Normalisation removes redundancy (1NF → 2NF → 3NF).
Build and query it → SQL (DDL to create tables, DML to insert/query data).
Run it safely → security & administration (views, roles, access control, backups, transactions).
Do all of this ethically → legal & ethical issues, privacy law, and Indigenous data principles.

Keep that order in your head. Most exam questions are "do step X" or "spot the mistake in step X".

PART 1 — Databases and the relational model

What a database is

A database is an organised collection of related data. A DBMS (e.g. MySQL) is the software that stores, manages and queries it.

In the relational model, data lives in tables (also called relations).

A row (also called a tuple or record) = one instance, e.g. one student.
A column (also called an attribute or field) = one property, e.g. studentName.
Every row should be unique.
Each cell holds a single atomic value (one value, not a list).

Why use multiple tables? (Redundancy)

If you cram everything into one table you get data redundancy — the same data stored many times. Example: storing a doctor's name next to every appointment means the name is repeated on every row.

Redundancy is bad for two reasons:

Integrity — if the data changes (doctor changes their name) you must update every copy. Miss one and the data is now inconsistent.
Efficiency — wasted storage and slower entry.

The guiding principle: each row is an elementary fact, and we do not repeat elementary facts. Splitting data across linked tables fixes this.

Data anomalies (what redundancy causes)

Update/modification anomaly — you have to change the same fact in many rows.
Insertion anomaly — you can't add a new fact without also supplying unrelated data.
Deletion anomaly — deleting a row accidentally destroys other facts you wanted to keep.

Normalisation (Part 6) exists to eliminate these.

Three-level ANSI-SPARC architecture

Databases are described at three levels so that the storage can change without breaking the programs that use it:

External level — what individual users/applications see (their own views; many external schemas).
Conceptual level — the whole logical structure of the database (entities, relationships, constraints). This is the level ORM models.
Internal level — how data is physically stored (files, indexes).

Data independence is the payoff:

Logical data independence — change the conceptual schema without changing external views.
Physical data independence — change physical storage without changing the conceptual schema.

Information systems modelling

A conceptual model must be powerful enough to express the universe of discourse (UoD) — the slice of the real world you're modelling — and must come with a procedure to build and validate the model. That procedure is the CSDP (Part 4).

PART 2 — Legal, ethical and Indigenous data issues

This is heavily examinable as short-answer (the practice exam's biggest written question, Q16, is entirely CARE principles). Learn the four CARE letters cold.

Legal vs ethical

Ethics = "a set of principles of correct conduct" / a system of moral values. Or, memorably: "knowing the difference between what you have the right to do and what is right to do."
Laws always lag behind technology. Ethics fills the gap between a new problem appearing and a law being written.
Not all unethical behaviour is illegal, and not all legal behaviour is ethical.
Ethics is subjective: personal values differ, and the concept of privacy and "acceptable" data use varies across cultures and nations.

Why organisations care (compliance)

Notifiable Data Breaches (NDB) scheme (Australia, from Feb 2018) — organisations with turnover ≥ $3M must report eligible breaches; fines up to ~$2.1M for companies.
GDPR (EU) — applies to anyone handling EU residents' data; fine up to 4% of global turnover or €20M, whichever is greater; introduced the "right to be forgotten."
Australian Privacy Act 1988 and the Data Retention Law.
A data breach = a security violation where sensitive/protected/confidential data is copied, transmitted, viewed, stolen or used by someone unauthorised.

Key ethical concern areas

Data collection — what to collect and how; secretive/disingenuous "Terms & Conditions" (e.g. the "Herod clause"); how anonymised the data really is (re-identifiability).
Data usage — surge pricing on personal data; reusing data collected unethically; the line between "enhanced service" and "creepy"; disclosing sensitive (e.g. medical) data.
Big data ethics — privacy, informed consent (future uses unknown), de-/re-identification, the digital divide, social consent bypassing individuals.
Guidelines — organisational ethics policy + professional codes of ethics (ACM, ACS, BCS).

Securing data (the security toolkit)

Views — a view is the dynamic result of a stored query (a "virtual table"). It's a security mechanism: give a user access to a view that hides sensitive columns/rows, without giving them the base table. (e.g. secretaries see name/position/branch but not salary/DOB.) Created with CREATE VIEW.
Backup & recovery — backup = periodically copy the database + log to offline storage; recovery = restore from a backup after a crash/error. MySQL has logical/hot backups (server stays online, e.g. mysqldump) and physical/cold backups (server offline).
Integrity — constraints keep data valid.
Encryption — protects data at rest/in transit.
RAID — redundant disks for availability/fault tolerance.

Indigenous data

Indigenous Data Sovereignty (IDS) — the inherent right of Indigenous peoples to govern data about their communities, lands and knowledge. It's cultural, ethical and community-driven, not just legal.
Indigenous Data Governance (IDG) — the right to autonomously decide what, how and why Indigenous data is collected, accessed and used.
BADDR data (what to avoid): Blaming, Aggregated, Decontextualised, Deficit-focused, Restricted access. Indigenous data needs the opposite: lifeworld/narrative, disaggregated, contextualised, Indigenous-priority, available & amenable.
Design considerations — respect local language/terms (e.g. "mob" not "client"); co-design with community and elders; tiered access; consent workflows; metadata describing cultural context.

CARE principles (GIDA 2019) — MEMORISE

Use these to assess a scenario (the exam asks "does this comply with C/A/R/E?" — answer yes/no and justify).

C — Collective Benefit: data use must support community outcomes/benefit.
A — Authority to Control: the community decides what data is collected, accessed and shared (they control it).
R — Responsibility: those using the data must maintain respectful relationships, respect cultural protocols and diversity, and avoid harm.
E — Ethics: honour Indigenous worldviews, rights and community standards; consult on benefits/harms and future use.

CARE complements FAIR (Findable, Accessible, Interoperable, Reusable) — FAIR is about making data usable, CARE is about using it ethically/responsibly.

Worked example (the AIBS scenario from the practice exam):

C ✓ — data was used to improve health outcomes (community benefit).
A ✗ — the communities no longer have access or any say over management/use → Authority to Control fails.
R ✗ — relationships with the communities were not maintained after collection → Responsibility fails.
E ✗ — communities weren't consulted on benefits/harms/future use; potential to be used in ways that violate their rights → Ethics fails.

PART 3 — Object-Role Modelling (ORM) & elementary facts

ORM is how IFB105 models the conceptual schema. It's built on elementary facts.

Elementary facts

An elementary fact is a simple, atomic assertion that is taken to be true, stating that particular objects play particular roles. Test for "elementary":

It cannot be split further without losing information. (If you can break it into two smaller facts, it isn't elementary.)
It asserts something is true (not "is not", not a negation).
It contains no conditions ("if/then", "so").

Standard verbalisation format:

The EntityType with ReferenceMode "value" ... [plays role] ... the EntityType with ReferenceMode "value".

Example: The Employee with EmployeeID "E1235" has the Name "Elizabeth Smith".

Spotting non-elementary facts (exam favourite):

"...has Phone and Email..." → splittable → not elementary.
"...is not a patient..." → a negation → not a valid elementary fact.
"...visited Sydney, so...visited Melbourne" → a condition → not elementary.

Objects: entities vs values

A value is self-identifying — a constant needing no description. Character strings ("Australia", "123ABC") and numbers (17, 3.14) are values. A value type has no reference mode.
An entity needs a definite description to identify it (a tangible thing like the City "Brisbane", or abstract like the Unit "IAB201"). Entities are identified by a reference mode (e.g. StudentID, EmployeeID).

In "Student N2345 received a Grade of 5 in Unit ABC123": Grade is a value type object (no reference mode); Student/Unit are entity types; StudentID is a reference mode; "ABC123" is a data value (a specific data point).

Entity types and reference modes

An entity type = the set of all possible instances of a kind of thing (e.g. City = all cities in the UoD).
A reference mode tells you how an entity is identified, written in parentheses: Student (.StudentNr), Device (.AssetID).
Entities are treated as having no internal structure; values are lexical (they have internal structure — strings, numbers). You can portray an entity's structure externally as roles (e.g. Car(regNr) contains Engine(engineNr)).

Fact types & semantic equivalence (Step 2 detail)

Two verbalisations of the same relationship are semantically equivalent — e.g. "Employee Mary Smith works in Department Sales" and "Department Sales employs Employee Mary Smith". The role works in implies the inverse role employs; together the two roles form one fact type (which you can name, e.g. DepartmentLocation).
Each real example is a fact type instance of that fact type.
A population check = filling the fact type with sample rows to confirm it holds.

Step 3 detail — combining entity types & subtypes

Four questions to ask in Step 3: (Q1) Can the same entity belong to two entity types? (Q2) Can entities of two types be meaningfully compared? (Q3) Is the same info recorded for different entity types (might we list them together)? (Q4) Is a fact type arithmetically derivable?

Primitive entity types partition the UoD — every instance belongs to exactly one (e.g. a Person is not a City). Subtypes of a primitive type need not be mutually exclusive (someone can be both a Woman and a Manager).
Derived fact types (e.g. a total computed from other facts) are noted here and not stored in the final tables.

NULLs and the mandatory dot (Step 5 detail)

A NULL (shown as "?" in a populated table) means a value is not recorded. If a role is optional, its value may be NULL; if mandatory (● dot), every instance must have a value.
The mandatory dot can be drawn at either end of the role line. "Each Patient has exactly one PatientName" = uniqueness (at most one) plus mandatory (at least one).

Counting objects in a fact (exam Q17 style)

Count the distinct objects (entities + value types), not the values.

"Employee E0123456 has the Job Title 'Associate Lecturer'." → 2 (Employee, Job Title).
"Student N014156 received a Mark of 65 on Exam ABC123." → 3 (Student, Mark, Exam).
"Device D854125 is unallocated." → 1 (Device only — unary).
"Device D893665 is allocated to Employee E0123456." → 2 (Device, Employee).

Arity (how many roles in a fact type)

Unary — one object plays one role: "Hotel Room 1123 is vacant." (Room is the only object.)
Binary — two roles: "Employee has Name."
Ternary — three roles.

Associations (multiplicity) — read these from the diagram

One-to-one (1:1): each side links to at most one of the other. "Every patient has one medical record and each record belongs to one patient."
One-to-many (1:n): "A professor teaches many classes, but each class has one professor."
Many-to-many (m:n): "A student studies many subjects and a subject has many students." (Both sides "many".)

ORM diagram notation (how to read/draw)

Entity type = solid rounded rectangle with reference mode, e.g. Patient (.ID).
Value type = dashed rounded rectangle, e.g. Patient Name.
Fact type / role = a box (or boxes joined) between objects; each small box is one role. A binary fact type has two role boxes; ternary has three.
Predicate reading = the label, e.g. "has", "is being seen by", with a ► arrow showing reading direction.
Uniqueness constraint (UC) = a bar/line over role box(es) showing which role(s) are unique.
Mandatory role constraint = a dot (●) where the object connects to the role — means every instance of that object must play this role.
Reference scheme = the (.ID) style identifier.

Reading multiplicity from uniqueness bars

UC over one role of a binary fact = that side is "one" (functional/many-to-one).
UC over both roles (spanning) = many-to-many.
UC over each role separately = one-to-one.
A dot on the object's connection = mandatory (must participate).

So for the practice-exam hospital diagram: a single UC arrangement giving Patient↔Bed a 1:1 means "a patient is assigned to exactly one bed and each bed has one patient"; an m:n between Patient and Clinician means "a patient can be seen by many clinicians and a clinician sees many patients."

The CSDP — Conceptual Schema Design Procedure (7 steps)

You only need Steps 1–5 for the exam (6–7 are excluded).

Transform familiar examples into elementary facts. Take sample data / English statements and write atomic facts in the standard format.
Draw the fact types and apply a population check. Draw each fact type (objects + roles) and fill it with sample data to confirm it works.
Check for entity types to be combined, and note arithmetic derivations. Merge duplicate entity types; mark any derived values (e.g. a value computed from others — don't store it).
Add uniqueness constraints and check the arity of fact types. Decide which roles are unique. The n–1 rule: in a fact type with n roles, a uniqueness constraint must span at least n–1 roles (you can't have a UC on fewer). A binary (n=2) UC must cover at least 1 role; if it covers 0 or only a partial pattern that breaks this, the diagram is wrong. (This is why Q5's option B is the incorrect diagram.)
Add mandatory role constraints and check for logical derivations. Mark which roles every instance must play (the ● dot).
(Not examined) Add value, set-comparison and subtyping constraints.
(Not examined) Add other constraints and final checks.

Constraints you must recognise

Uniqueness constraint (UC): values in that role (or combination) can't repeat.
Mandatory role constraint: every object instance must play that role (the ● dot). "No X can be without a Y."
Constraint-violation questions (Q19 style): given a populated ORM schema, decide if a command Executes or is "XX Violated":
- Adding a duplicate value to a single-role UC → uniqueness violated.
- Removing/omitting a mandatory role so an object plays another role without its mandatory one → mandatory violated.
- Adding a duplicate combination to a spanning UC → that UC violated.

PART 4 — Relational Mapping (RMap)

RMap turns an ORM conceptual schema into a set of relational table definitions in Optimal Normal Form (minimum number of normalised relations). You apply four rules in strict order.

RMap syntax (write it exactly like this)

TableName {column1, column2, [column3], column4}

Underline = uniqueness constraint (there can be more than one UC per table).
Double underline = primary key (use this when a table has more than one UC, to show which is the PK).
[Square brackets] = optional column (allows NULL — i.e. not mandatory).
Dotted arrow = foreign key, pointing from the child's FK toward the parent's PK.

Keys (learn the definitions precisely)

Super key: any set of columns that is unique (uniquely identifies a row). Property: Unique.
Candidate key: a super key with no removable subset — unique and irreducible (can't drop a column and stay unique). Properties: Unique, Irreducible.
Primary key (PK): a chosen candidate key with no missing values (entity integrity). Properties: Unique, Irreducible, No nulls.
Alternate key: any candidate key not chosen as the PK.
Foreign key (FK): a column in the child table whose values must match a PK in the parent table → enforces referential integrity.

Useful truths:

Every candidate key is a super key, but not every super key is a candidate key (a super key may be reducible). So "every super key is a candidate key" is FALSE.
"A primary key is used in the child table to reference the parent" is FALSE — that's a foreign key.

The four mapping rules

Rule 1 — Many-to-many and ternary fact types → their own table. The new table's PK is the combination of the participating object identifiers. e.g. beingSeenBy {patientID, clinicianEmpID} (both underlined together), integratedDevice {bedID, deviceID}.

Rule 2 — One-to-many (and unary, and 1:1 grouped here) fact types → group onto the "one" side's entity table. Each entity type becomes a table keyed on its identifier; its functional (single-valued) attributes become columns. e.g. patient {patientID, patientName}, clinician {empID, clinicianName, medicalSpecialty}, bed {bedID, wardName}.

Rule 3 — Nested fact types (objectified relationships). If a fact type is objectified (treated as an object that itself plays roles), map it to a table whose PK is the underlying fact type's key, plus its own attributes. e.g. Installation {bedID, assetID, installationDate} (PK = bedID + assetID). If there's no nesting, Rule 3 does not apply.

Rule 4 — One-to-one fact types → put the fact in only ONE relation. Which one depends on functional roles and mandatory constraints. Four sub-rules:

4.1 If only one object has another functional role, group on its side. Map the other object's column into that existing relation. (PK = existing relation's key.)
4.2 If both have another functional role but only one has an explicit mandatory constraint, group on the mandatory side. (Map the non-mandatory object into the mandatory side's relation.)
4.3 If neither has another functional role, map the 1:1 fact to its own separate table (PK = the fact type's UC).
4.4 If both have a functional role and (neither mandatory, or both mandatory) → your discretion; choose the mapping that minimises NULLs.

Foreign keys: after applying the rules, add FK arrows wherever a column in one table references the PK of another (e.g. patient.bedID → bed.bedID; installation.assetID → device.assetID).

Worked example (practice-exam Q20 → answer)

ORM: Device tracked-by RFID (1:1), manufactured-by Manufacturer (n:1), has Technician Notes (optional), and a nested "installation" (Bed has integrated Device) that was installed-on a Date and requires-maintenance-on a Date where maintenanceDate = installationDate + 1 year (derived).

Rule 1: none (no m:n).
Rule 2: Device {assetID, RFIDTagNo, MFID, [technicianNotes]} — technicianNotes optional (dashed = optional), RFIDTagNo & MFID are functional single-valued attributes.
Rule 3: Installation {bedID, assetID, installationDate} — PK is the nested fact key; maintenanceDate is excluded because it is derived, not stored.
Rule 4: none.
FK: Installation.assetID → Device.assetID.

The lesson worth memorising: derived/computed values are NOT stored (note them in Step 3, leave them out of the tables).

PART 5 — Normalisation

Normalisation improves data integrity by removing redundancy so that each non-key attribute is fully dependent on the entire key, a fact is stored only once, and insert/update/delete anomalies disappear.

Functional dependencies (FDs)

For a relation, Y is functionally dependent on X (written X → Y) if for each value of X there is at most one value of Y.

X = determinant ("X determines Y"); Y = determinee ("Y depends on X").
It does not mean causation — just that X lets you look up Y.
Can be composite: StudentID, UnitID → Grade.
Does not necessarily work in reverse.
A key determines all non-key attributes.

Dependency types (precise lecture definitions)

Fully functional dependency: Y is fully functionally dependent on X if Y depends on X and not on any proper subset of X. (e.g. with key (Part, Warehouse): Part, Warehouse → Quantity is full.)
Partial (functional) dependency: a non-key attribute is functionally dependent on part of a candidate key (only possible with a composite key). (e.g. Warehouse → Warehouse_address where the key is (Part, Warehouse).) → breaks 2NF.
Transitive (functional) dependency: a non-key attribute depends on another non-key attribute (key → A → B, so B is transitively dependent on the key). (e.g. Employee → Department, Department → Location.) → breaks 3NF. Equivalently: in 3NF all non-key attributes are mutually independent.

Course scope: IFB105 normalises only up to 3NF. BCNF is not taught; 4NF/5NF appear only on a diagram and are not examined.

Identifying keys (method)

A column/combination is a candidate key if it's always unique across all rows AND can't be reduced. Test by checking the sample data for duplicates.
If a single column repeats (e.g. PatientID appears twice), it's not a key on its own.
The combination that's unique but has a removable column is a super key, not candidate.

(Q12 worked: with PatientID repeating and (DoctorID, WardName) repeating, only (PatientID, WardName) is unique and irreducible — DoctorID can be dropped — so it's the candidate key.)

The normal forms

0NF (unnormalised): has repeating groups / non-atomic cells / no proper key.

1NF — First Normal Form. A table is in 1NF when:

It has a primary key (every row uniquely identifiable).
Every column holds one data type.
All values are atomic — no repeating groups or lists.

2NF — Second Normal Form. In 1NF AND no partial dependencies (no non-key attribute depends on only part of a composite key).

Only at risk if the PK is composite. If the PK is a single column, a 1NF table is automatically 2NF.

3NF — Third Normal Form. In 2NF AND all non-key attributes are mutually independent (no transitive dependencies — no non-key attribute determines another non-key attribute).

Rule of thumb to find "highest normal form" (Q9 style):

Atomic + has key? If not → 0NF. If yes → at least 1NF.
Composite key with a partial dependency? → stuck at 1NF.
A non-key → non-key dependency? → stuck at 2NF.
Neither problem → 3NF.

(Q9 worked: the StoreID/ProductID table is 1NF; ManagerID & ManagerName depend on StoreID alone (part of the composite key) → partial dependency → highest form is 1NF.)

How to normalise to 3NF (the procedure)

Confirm 1NF (atomic values, a key). If repeating groups exist, split them out.
Remove partial dependencies (→ 2NF): for each part of the key that determines some attributes, pull those attributes into a new table keyed on that part.
Remove transitive dependencies (→ 3NF): if a non-key attribute determines another non-key attribute, move that pair into its own table keyed on the determinant.
Write it in RMap format with PKs underlined and foreign keys added back to link the tables.

Worked example (practice-exam Q21): Original PK = (ProviderID, AppointmentTime).

providerName, role depend on ProviderID alone → partial dependency.
patientName depends on patientID (a non-key) → transitive dependency.

3NF result:

Provider {providerID, providerName, role}
Appointment {providerID, appointmentTime, patientID}
        FK providerID → Provider.providerID
        FK patientID  → Patient.patientID
Patient {patientID, patientName}

(Underline the PKs; the composite key in Appointment is providerID + appointmentTime.)

Another (Q10): A book can have multiple authors → a Book–Author m:n needs an Authorship bridge table; LoanedTo is optional ([LoanedTo]):

Book {BookID, BookTitle, [LoanedTo]}
Author {AuthorID, AuthorName}
Authorship {BookID, AuthorID}

PART 6 — SQL

SQL has sub-languages: DDL (define structure), DML (manipulate data), DQL (SELECT), DCL (permissions).

Data types & integrity (DDL building blocks)

Common types: INT, VARCHAR(n), CHAR(n), DECIMAL(p,s), DATE, DATETIME, ENUM('a','b',...) (value restricted to a list).
Integrity rules:
- Entity integrity — PK columns can't be NULL.
- Referential integrity — an FK must match an existing PK in the parent (or be NULL if allowed).
- Domain/column constraints — NOT NULL, UNIQUE, CHECK, DEFAULT, AUTO_INCREMENT.

DDL — CREATE / ALTER / DROP

CREATE TABLE Ticket (
    ticketID INT PRIMARY KEY AUTO_INCREMENT,
    eventID  INT NOT NULL,
    orderID  INT NOT NULL,
    category ENUM('Child','Adult','VIP','Concession'),
    seatNo   VARCHAR(15),
    FOREIGN KEY (orderID) REFERENCES Orders(orderID),
    FOREIGN KEY (eventID) REFERENCES Event(eventID)
);

This is exactly the practice-exam Q22 answer: A=PRIMARY KEY, B/C=NOT NULL, D=ENUM, E/G=FOREIGN KEY, F/H=REFERENCES.

ALTER TABLE t ADD column ... / MODIFY ... / DROP COLUMN ... — changes structure.
DROP TABLE t — deletes the whole table.
Common trap: ALTER/MODIFY change structure, not row data. To change data you use UPDATE (below).

DML — INSERT / UPDATE / DELETE

INSERT INTO Customers VALUES ('CU1','Name','Address','Phone');
INSERT INTO Customers (CustomerID, CustomerName) VALUES ('CU2','Name');

UPDATE Customers
SET Address = '19 Albert Street'
WHERE CustomerID = 'CU091214';      -- correct UPDATE form (Q15 answer = C)

DELETE FROM Customers WHERE CustomerID = 'CU2';

The only correct update syntax is UPDATE table SET col = val WHERE .... ALTER ... UPDATE, SET ... FROM, and MODIFY are wrong.
Will an INSERT fail? (Q23 style) It fails if it violates a constraint:
- Duplicate PK → fails.
- Duplicate value in a UNIQUE column (e.g. email) → fails.
- FK references a non-existent parent row → referential integrity violation → fails.
- Otherwise → Executes.

SELECT — the core

General syntax (clause order cannot change; only SELECT and FROM are mandatory):

SELECT [DISTINCT | ALL] {* | columnExpression [AS newName] [,...]}
FROM TableName [alias] [,...]
WHERE condition
GROUP BY columnList
HAVING condition
ORDER BY columnList [ASC|DESC];

* = all columns. DISTINCT removes duplicate rows. AS newName renames a column in the output. You can use calculated/derived fields (e.g. salary*12 AS annualSalary) and table aliases.
Logical order of evaluation: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY.
WHERE filters individual rows. Operators: =, <>, <, >, <=, >=, AND, OR, NOT, BETWEEN, IN, IS NULL, IS NOT NULL.
NULL means "unknown"; test with IS NULL / IS NOT NULL (never = NULL).

Pattern matching

LIKE with wildcards: % = any sequence of characters, _ = exactly one character.
- WHERE name LIKE 'A%' → starts with A. LIKE '%son' → ends with son. LIKE '_a%' → second letter a.

Ordering

ORDER BY col ASC (default) or DESC. You can order by an aliased column (e.g. ORDER BY total_loans DESC).

Aggregate functions

COUNT(*), COUNT(col), SUM(col), AVG(col), MIN(col), MAX(col).

COUNT(*) counts rows. With GROUP BY bookID, COUNT(*) counts how many times each bookID appears (Q14 answer).

GROUP BY / HAVING

GROUP BY collapses rows into groups; use it with aggregates to get per-group results.
HAVING filters groups after aggregation (WHERE filters rows before).

SELECT customerID, SUM(totalPrice)
FROM Orders
GROUP BY customerID
HAVING SUM(totalPrice) > 100;

Set operations

Combine results of two SELECTs (same columns):

UNION — rows in either (duplicates removed); UNION ALL keeps duplicates.
INTERSECT — rows in both.
EXCEPT (a.k.a. MINUS) — rows in the first but not the second.

Subqueries (nested queries)

A query inside another. Solve inside-out.

SELECT firstName, lastName, email FROM Customer
WHERE customerID IN (
    SELECT customerID FROM Orders
    GROUP BY customerID
    HAVING SUM(totalPrice) > 100
);

This is the Q24 answer: find each customer's total spend with GROUP BY + SUM + HAVING, then return their details.

Joins

Combine columns from multiple tables on a matching condition.

INNER JOIN — only rows that match in both tables.
LEFT (OUTER) JOIN — all rows from the left table, plus matches from the right (NULLs where no match).
RIGHT (OUTER) JOIN — all rows from the right table.
FULL OUTER JOIN — all rows from both.

SELECT c.firstName, o.totalPrice
FROM Customer c
INNER JOIN Orders o ON c.customerID = o.customerID;

You can combine a subquery and a join, and join a table to itself (self-join) using aliases.

Reading query results (Q25 skill)

Work strictly inside-out, track the intermediate sets, and write the output exactly as MySQL would (right capitalisation/punctuation). Examples from the practice exam:

Nested IN chain → VIP order 4 → customer 1 → Holly Richardson.
GROUP BY ... HAVING count(*)>2 EXCEPT (totalPrice<100) → {1,3,5} minus {1,4,5} = 3.
INNER JOIN ... WHERE category='Child' over Extras orderIDs {2,5} → 5.
LEFT JOIN ... WHERE notes IS NULL AND quantity IS NOT NULL → order 5 → 59.00.

DCL — permissions (Data Control Language)

DCL grants/revokes privileges so users can only do their job (least privilege).

GRANT {privilegeList | ALL PRIVILEGES} ON objectName
TO {userList | PUBLIC} [WITH GRANT OPTION];

REVOKE {privilegeList | ALL PRIVILEGES} ON objectName FROM userList;

Object privileges (on a table/view): SELECT, INSERT, UPDATE, DELETE, REFERENCES, USAGE. INSERT/UPDATE can be restricted to specific columns.
The user who creates an object automatically has all privileges on it.
PUBLIC = grant to all present and future users. WITH GRANT OPTION = the grantee can pass the privilege on.
Combined with roles (Part 7), this is how access control is implemented.

PART 7 — Database security & administration

Roles

A role = a named collection of privileges that can be assigned to users (or to another role). Roles simplify maintenance: grant privileges to a role once, then assign the role to many users.

CREATE ROLE 'order_payment_role', 'inventory_role';
GRANT SELECT, INSERT, UPDATE, DELETE ON customer TO order_payment_role;
-- then assign the role to users

Access control

Discretionary Access Control (DAC): the most common approach — based on granting/revoking privileges with GRANT/REVOKE. Two privilege categories:
- Object privileges — actions on a table/view (SELECT, INSERT, UPDATE, DELETE, REFERENCES, USAGE).
- System privileges — managing the database/server itself (global, e.g. administrative privileges like creating users).
Role-Based Access Control (RBAC): privileges attach to roles; users get roles.
Mandatory Access Control (MAC): access decided by system-wide classifications/labels.
Views (Part 2) are a practical access-control tool — expose only what a user needs.

Database users

The DBA creates users, assigns them roles/privileges, and manages authentication. Principle of least privilege: give each user only what they need.

Transactions

A transaction = a logical unit of work: a series of actions by one user/program that reads or updates the database, which must succeed or fail together.

Why we need them — concurrency problems (when transactions run at once):

Lost update — two transactions overwrite each other's changes.
Uncommitted dependency (dirty read) — one transaction reads data another hasn't committed (and may roll back).
Inconsistent analysis — a transaction reads some values before and others after another transaction updates them. Transactions also give resilience to system failure (all-or-nothing recovery).

ACID properties:

A — Atomicity: all operations happen or none do (rollback on failure).
C — Consistency: a transaction moves the DB from one valid state to another (constraints preserved).
I — Isolation: concurrent transactions don't interfere; results are as if run in sequence.
D — Durability: once committed, changes survive crashes (written to disk/log).

Transaction Control Language (TCL):

COMMIT; — permanently save the transaction.
ROLLBACK; / ROLLBACK TO savepoint; — undo back to the last committed state (or a savepoint).
SAVEPOINT name; — mark a point you can roll back to.
In the SQL standard a transaction begins on the first statement and ends at COMMIT or session end. AUTOCOMMIT makes each statement its own transaction.

PART 8 — Exam technique (open-book, 2 hours)

Bring focused paper notes. Print the cheat-sheet file. Don't bring everything — you'll waste time flicking. Tab the sections (SQL syntax, RMap rules, normal forms, CARE).
Section A (MCQ, bubble sheet, pen): 15 questions, 1 mark each. These mirror the Canvas practice quizzes and Kahoots. Don't overthink — eliminate wrong options. Watch for "which is incorrect".
Section B (short answer, in the boxes): SQL, ORM diagrams, RMaps, normalisation. Stay inside the answer boxes — papers are scanned and anything outside the margin may not be captured.
Show your working for RMap/normalisation: state which Rule applies, name the dependency type, name the constraint being violated.
For SQL — get the clause order right (SELECT/FROM/WHERE/GROUP BY/HAVING/ORDER BY); use IS NULL not =NULL; remember HAVING for grouped conditions.
For "will it execute" — check PK uniqueness, UNIQUE columns, and FK referential integrity.
For CARE/ethics — always give a yes/no and a one-line justification per principle.
Time budget: ~15 marks in Section A, ~35 in Section B → spend roughly 35 min on A, 75 min on B, 10 min checking.

End of study guide. Pair this with the cheat sheets (fast lookup), the worked solutions (see the reasoning), and the extra practice questions (test yourself).

APPENDIX — Verified-from-Lectures Additions

These points were confirmed by reading every lecture slide visually (page by page). They extend, and are consistent with, the main guide above.

Security (Week 1 — "Securing Data") — extra detail for the security exam questions

Views as a security mechanism: a view is a named/stored query ("virtual table"); grant a user SELECT on the view (e.g. StaffInfo) so they see only safe columns and never the base table. CREATE VIEW StaffInfo AS SELECT fName, lName, position, branchNo FROM Staff;
Backup vs Recovery: backup = periodic copy of DB and log file to offline media; recovery = replace a damaged DB with a backup. MySQL: Logical/Hot (server stays online; mysqldump, SELECT … INTO OUTFILE) vs Physical/Cold (server offline; faster, file-copy via cp/scp/tar/rsync).
Journaling: keep a log/journal of all changes → minor failure = automatic recovery from journal; major failure = restore from backup.
Auditing (Oracle): statement (AUDIT UPDATE TABLE BY JACK), privilege (AUDIT CREATE TABLE), object (AUDIT SELECT ON …).
Encryption: a cipher renders data unreadable without the key. Public/asymmetric (encrypt with public key, decrypt with private) vs Private/symmetric (same key both ways).
RAID = Redundant Array of Independent Disks → fault tolerance; disk drives are the most failure-prone component.

Ethics — the Havasupai CARE worked example (this is the exact style of the exam ethics question)

Case: In 1989 the Havasupai Tribe gave blood to ASU consenting only to diabetes study; researchers later reused it for mental-illness, alcoholism and migration studies without further consent.

Assess against each CARE letter (the answer for all four is does NOT comply):

C — Collective Benefit (inclusive development, governance, equitable outcomes): the tribe was excluded from the reuse; later studies gave them no benefit and were traumatic → does not comply.
A — Authority to Control (recognising rights, data for/of governance): no fully-informed consent, no say in storage/reuse, data not accessible to them → does not comply.
R — Responsibility (positive relationships, capability, Indigenous worldviews): research disrespected dignity/beliefs; no responsibility for a positive relationship → does not comply.
E — Ethics (minimise harm/maximise benefit, justice, future use): data used in stigmatising ways (inbreeding/alcoholism stereotypes); ignored power imbalance and appropriate future use → does not comply.

Exam tip: for any "does this comply with [letter] of CARE?" question — (1) state what that letter means, (2) point to the specific facts, (3) conclude comply / does not comply.

ORM — extra confirmations

No connectives / no quantifiers in elementary facts: reject anything with not, and, or, if, all, some (e.g. "All people who jog get sore feet" ✗; "Sue is funny" ✓).
Legal/illegal CS diagrams (very examinable): a role must connect to exactly one object type; you may not connect two object types directly (objects must play a role); a role may not be played by more than one object type; no dangling/empty role.
Lossless join vs spurious rows: joining two binaries on a key is lossless; joining on a non-key produces extra (spurious) rows — that's why a ternary often can't be split and why objectification/nesting ("Enrolment") is used.
Derived facts are marked with * and a derivation rule (e.g. For each Playlist, runtime = sum(runtimes)) and are never stored (storing derivable data wastes space and can give wrong results when base facts change).
n‑1 rule (n‑ary fact type): a uniqueness constraint must span at least n−1 roles.
"Exactly one" = uniqueness (at most one) + mandatory dot (at least one).

RMapping — Rule 4 decision tree (1:1 fact types)

Does either object play another functional role?
- No → Rule 4.3: map the 1:1 fact to its own table (PK = the UC).
- Yes, only one → Rule 4.1: group the fact onto the table of the object with the functional role.
- Yes, both → check mandatory:
  - One is explicitly mandatory → Rule 4.2: group onto the mandatory side.
  - Neither / both mandatory → Rule 4.4: your discretion — choose the mapping that minimises NULLs.

Order of all rules is strict: Rule 1 (UC spanning >1 role → m:n & ternary tables) → Rule 2 (simple UC / functional role → entity tables) → Rule 3 (nested fact types) → Rule 4 (1:1).