Learn the whole unit — no PDFs required
This page teaches every examinable topic in IFB105 from scratch, in the order the unit builds them: model the world → map it to tables → clean the tables → query them → run them safely → do it ethically.
Open book · paper notes only 2 hours whole semester CSDP steps 6–7 not examined
0 How it all fits together
A database stores facts about the real world. The unit is one pipeline for building one properly. Keep this order in your head — most exam questions are "do step X" or "spot the mistake in step X".
- Understand the domain → write it as plain-English elementary facts (ORM Step 1).
- Model it → draw an ORM diagram (the conceptual schema) using the 7-step CSDP.
- Convert to tables → Relational Mapping (RMap) turns the diagram into table definitions.
- Clean the tables → Normalisation removes redundancy (1NF → 2NF → 3NF).
- Build & query → SQL (DDL creates tables, DML inserts/queries data).
- Run it safely → security & administration (views, roles, access control, transactions).
- Do it ethically → legal & ethical issues, privacy law and Indigenous data principles.
1 Databases & the relational model
A database is an organised collection of related data. A DBMS (e.g. MySQL) is the software that stores, manages and queries it. In the relational model, data lives in tables (also called relations).
- A row (a tuple / record) = one instance, e.g. one student.
- A column (an attribute / field) = one property, e.g. studentName.
- Every row should be unique.
- Each cell holds a single atomic value — one value, not a list.
Why multiple tables? Redundancy & anomalies
If you cram everything into one table you get data redundancy — the same fact stored many times. That's bad for two reasons:
- Integrity — if the fact changes you must update every copy; miss one and the data is inconsistent.
- Efficiency — wasted storage and slower entry.
Redundancy causes three anomalies (memorise these — common MCQ):
| Anomaly | Meaning |
|---|---|
| Update / modification | You must change the same fact in many rows. |
| Insertion | You can't add a new fact without also supplying unrelated data. |
| Deletion | Deleting a row accidentally destroys other facts you wanted to keep. |
The three-level ANSI-SPARC architecture
Databases are described at three levels so storage can change without breaking the programs that use it:
| Level | What it is |
|---|---|
| External | What individual users/apps see — their own views (many external schemas). |
| Conceptual | The whole logical structure — entities, relationships, constraints. This is the level ORM models. |
| Internal | How data is physically stored — files and indexes. |
Data independence is the payoff: you can change one level without affecting the others (logical & physical independence).
2 Legal, ethical & Indigenous data issues
Ethics = a set of principles of correct conduct / a system of moral values (Connolly & Begg). As Potter Stewart put it: "knowing the difference between what you have the right to do and what is right to do."
Legal vs ethical: they're not the same. Laws lag behind technology, so ethics fills the gap. Not all unethical behaviour is illegal, and not all legal behaviour is ethical.
Why organisations care (compliance)
| Regime | Key facts |
|---|---|
| NDB scheme (Notifiable Data Breaches) | From 22 Feb 2018; applies at turnover ≥ $3M; fines up to $420K (individuals) / $2.1M (companies). |
| GDPR (EU) | Max fine = 4% of global turnover or €20M, whichever is greater. Adds the "right to be forgotten". |
| Privacy Act 1988 (Aus) | Plus the Data Retention Law (2017). |
Two ethical concern areas
- Data collection — what/how you collect; secretive or disingenuous practices (vague "Terms & Conditions", the 'Herod clause'); re-identifiability of "anonymised" data.
- Data usage — how collected data is used; grey areas like surge pricing from personal data, reusing data collected unethically, "enhanced service" vs "creepy".
Professional codes you can cite: ACM, ACS (Australian Computer Society), BCS (British Computer Society).
Indigenous data & the CARE principles
Indigenous Data Sovereignty (IDS) = the right of Indigenous peoples to govern the collection, ownership and application of data about their communities. Indigenous Data Governance (IDG) is how that data is managed to support IDS. The unit's exam ethics question is almost always: "does this scenario comply with [a letter] of CARE?"
CARE (GIDA 2019) — learn all four:
| Letter | Principle | Means |
|---|---|---|
| C | Collective Benefit | Data use must support community outcomes — inclusive development, governance, equitable outcomes. |
| A | Authority to Control | Communities decide what data is collected, accessed and shared. |
| R | Responsibility | Respect cultural protocols, avoid harm, build positive relationships, support Indigenous worldviews. |
| E | Ethics | Honour worldviews; minimise harm, maximise benefit; justice; consider future use. |
CARE complements FAIR (Findable, Accessible, Interoperable, Reusable).
The Havasupai case (the model answer pattern). A tribe gave blood consenting only to a diabetes study; researchers later reused it for mental-illness/alcoholism/migration studies without consent, and published stigmatising results. Assess against CARE — every letter is "does NOT comply":
- C — the tribe was excluded from the reuse; later studies gave no benefit and were harmful.
- A — no informed consent for reuse, no say in storage/use, data not accessible to them.
- R — research disrespected dignity/worldviews; no positive relationship maintained.
- E — stigmatising use, ignored power imbalance and appropriate future use.
Securing data (the security toolkit)
- Views — a view is a stored/named query ("virtual table"). Grant a user SELECT on a view that exposes only safe columns → hides sensitive columns and the base table. The main security mechanism examined.
- Backup vs recovery — backup = periodic copy of the DB and log file offline; recovery = replace a damaged DB with a backup. MySQL: Hot/Logical (server stays online,
mysqldump) vs Cold/Physical (server offline, faster file copy). - Journaling — a log of all changes; minor failure → auto-recover from the journal, major failure → restore from backup.
- Auditing (Oracle) — statement, privilege, object auditing.
- Encryption — a cipher makes data unreadable without the key. Public/asymmetric (encrypt with public key, decrypt with private) vs Private/symmetric (same key both ways).
- RAID — Redundant Array of Independent Disks → fault tolerance (disks are the most failure-prone component).
3 Object-Role Modelling (ORM)
ORM models the conceptual schema as objects playing roles. You start with plain-English facts and end with a diagram, following the 7-step CSDP (Conceptual Schema Design Procedure).
Elementary facts (CSDP Step 1)
An elementary fact is a simple, atomic assertion that particular objects play particular roles — it can't be split into smaller facts about the same objects.
✗ "Mary and David work in Sales" (split into two) · ✗ "All staff earn $50k" · ✓ "The Student with ID 'n7' is enrolled".
Objects: entities vs values
- Value — a self-identifying constant (e.g. "Australia", 17). Drawn as a dashed rectangle.
- Entity — needs a definite description. Drawn as a solid rounded rectangle.
Fact types & reference modes
An entity is identified by three things: entity type + reference mode + value — e.g. Employee (.ID) 'E0002'. (In "The Lecturer with StaffID 'E22'…", StaffID is the reference mode — a classic MCQ.)
A predicate is a sentence template with 'object holes'; its arity = the number of roles: unary (1), binary (2), ternary (3), quaternary (4). A fact type = object type(s) + a predicate, e.g. Employee works in Department.
Surface vs deep structure: "Mary works in Sales" and "Sales employs Mary" look different but are the same fact type (same deep structure).
Diagram notation — how to read & draw it
Constraints & reading multiplicity
A uniqueness constraint (UC) bar means the values under that role (or combination) are unique. Read multiplicity from where the bar sits on a binary:
| Where the UC bar is | Multiplicity |
|---|---|
| Under one role only | 1:N (functional) — that side is the "one" side |
| Spanning both roles | m:n (many-to-many) |
| A separate bar on each role | 1:1 (one-to-one) |
A mandatory-role dot means every instance of that object type must play that role (no NULLs there). A "?" in a population = NULL = optional role.
The n−1 rule & legal diagrams
- n−1 rule: in an n-ary fact type, a uniqueness constraint must span at least n−1 roles.
- Legal-diagram rules: objects must play a role (you can't join two object types directly); each role connects to exactly one object type (no role played by two types; no dangling role).
* and a rule (e.g. "for each Article, markup = retailPrice − wholesalePrice") and are never stored — storing derivable data wastes space and can go wrong when base facts change.The 7-step CSDP
The procedure that turns facts into a diagram. Steps 6–7 are not examined, but know the first five cold.
- Transform familiar examples into elementary facts.
- Draw the fact types and apply a population check (populate with sample instances).
- Check for entity types to be combined; note arithmetic derivations. (Combine when the same entity belongs to two types — e.g. Stephen Chow is actor and director → one Person type.)
- Add uniqueness constraints and check arity.
- Add mandatory role constraints and check for logical derivations.
- Add value, set-comparison, subtyping constraints. not examined
- Add other constraints; final checks. not examined
4 Relational mapping (ORM → tables)
RMap converts the ORM diagram into table definitions in Optimal Normal Form (a minimal set of normalised relations). First you must know keys precisely.
| Key | Properties |
|---|---|
| Super key | Unique. |
| Candidate key | Unique + Irreducible (no proper subset is unique). |
| Primary key | Unique + Irreducible + Entity integrity (no nulls). |
| Alternate key | Any candidate key not chosen as the primary key. |
| Foreign key | A column sharing a domain with a candidate key in another table; references that table. |
Three inherent integrity constraints: Entity (PK unique & not null), Referential (a FK must match an existing row; deletion rules: cascade, restrict, set null, set default), Domain (values fit a data type).
RMap syntax — write it exactly like this
| Notation | Means |
|---|---|
| Single underline | a uniqueness constraint (can be more than one) |
| Double underline | the primary key (when other UCs exist) |
[ brackets ] | optional column (NULLs allowed) |
| dotted arrow → | foreign key, pointing toward the primary key it references |
Student {studentID, name, email, [phone]}
Studies {studentID, unitID} FK studentID → Student, FK unitID → Unit
Unit {unitID, title}
The four mapping rules (apply in strict order 1→2→3→4)
| Rule | Applies to | Result |
|---|---|---|
| 1 | UC spanning >1 role (m:n & ternary+) | its own table; PK = the UC |
| 2 | entity with a simple UC (functional role) | one table per entity; PK = reference mode |
| 3 | nested fact type | treat as an entity (like Rule 2); PK = the nested UC |
| 4 | 1:1 fact type | place the fact in one relation — sub-rules 4.1–4.4 |
Rule 4 decision tree (1:1)
- Does either object play another functional role? No → 4.3: the 1:1 fact gets its own table.
- Only one does → 4.1: group the fact onto that object's table.
- Both do → is exactly one role mandatory? Yes → 4.2: group onto the mandatory side. Neither/both → 4.4: your discretion, minimise NULLs.
Worked RMap. Member (.memberID) has Name (1:1, mandatory) & [email]; Member enrolled in Class (.classID) is m:n; Class has ClassName; Class led by Trainer (.trainerID) is n:1; Trainer has TrainerName.
Member {memberID, name, [email]}
Class {classID, className, trainerID} → trainerID FK → Trainer
Trainer {trainerID, trainerName}
EnrolledIn {memberID, classID} → FKs to Member, Class (Rule 1)
5 Normalisation
Normalisation improves integrity by minimising redundancy — every non-key attribute fully depends on the whole key, and each fact is stored once. It's driven by functional dependencies.
Functional dependencies (FDs)
X → Y ("X determines Y") means: for each value of X there is at most one value of Y. X is the determinant, Y the determinee. (It does not imply causation — just identification.) A key determines the non-key attributes.
| Dependency | Definition | Example |
|---|---|---|
| Partial | a non-key attribute depends on only part of a composite key (only with composite keys) | (Part, Warehouse) is the key but Warehouse → WarehouseAddress |
| Transitive | a non-key attribute depends on another non-key attribute | Employee → Department → Location |
| Full | depends on the whole key | (Part, Warehouse) → Quantity |
The normal forms — 1NF → 2NF → 3NF
| Form | Requirement | Fix to get there |
|---|---|---|
| 1NF | atomic values only (no repeating groups/arrays); ≥1 candidate key | split comma-lists into separate rows |
| 2NF | 1NF + no partial dependencies (only matters with a composite PK) | move the partially-dependent attribute into a new table keyed on the part it depends on |
| 3NF | 2NF + no transitive dependencies | move the transitively-dependent attribute into a new table keyed on its non-key determinant |
Normalise to 3NF. ENROLMENT, PK (StudentID, UnitCode): StudentName, UnitName, Grade, TutorID, TutorName. FDs: StudentID→StudentName, UnitCode→UnitName, (StudentID,UnitCode)→Grade,TutorID, TutorID→TutorName.
Partial deps: StudentName (on StudentID) and UnitName (on UnitCode). Transitive dep: TutorID→TutorName. 3NF result:
Student {StudentID, StudentName}
Unit {UnitCode, UnitName}
Tutor {TutorID, TutorName}
Enrolment {StudentID, UnitCode, Grade, TutorID} → FKs to Student, Unit, Tutor
6 SQL
SQL has two sub-languages here: DDL (define structure) and DML (manipulate data), plus DCL (permissions). Data integrity is enforced by constraints.
DDL — creating tables
CREATE TABLE Ticket (
ticketID INT PRIMARY KEY AUTO_INCREMENT,
eventID INT NOT NULL,
customerID INT NOT NULL,
ticketType ENUM('GA','VIP','Backstage'),
price DECIMAL(6,2),
FOREIGN KEY (eventID) REFERENCES Event(eventID),
FOREIGN KEY (customerID) REFERENCES Customer(customerID)
);
NOT NULL= value required;UNIQUE= a candidate key (no duplicates).- Restrict values with
ENUM(...)or aCHECKconstraint. ALTER TABLE t ADD COLUMN col TYPE;changes structure.DROP TABLEremoves the whole table;DELETEremoves rows only.
DML — insert / update / delete
INSERT INTO Ticket (eventID, customerID, ticketType, price) VALUES (2, 3, 'GA', 45);
UPDATE Ticket SET price = 50 WHERE ticketID = 4; -- omit WHERE → every row!
DELETE FROM Ticket WHERE ticketID = 4;
SELECT & filtering
SELECT firstName, lastName FROM Customer
WHERE city = 'Brisbane'
ORDER BY lastName DESC;
- Pattern matching with
LIKE:%= any number of characters,_= exactly one. So'a%'=starts with a,'_a%'=2nd char is a,'%a'=ends with a. - ORDER BY is ascending by default; add
DESCto reverse. - DISTINCT removes duplicate rows.
Grouping & aggregates
Aggregates: COUNT, SUM, AVG, MIN, MAX. GROUP BY collapses rows into groups so an aggregate is computed per group.
SELECT e.eventName, COUNT(*) AS ticketsSold
FROM Event e JOIN Ticket t ON e.eventID = t.eventID
GROUP BY e.eventID, e.eventName
HAVING COUNT(*) > 2;
Set operations: UNION (combine, drop duplicates), UNION ALL (keep duplicates), INTERSECT (in both), EXCEPT (in the first but not the second).
Joins & subqueries
| Join | Returns |
|---|---|
| INNER JOIN | only rows matching in both tables |
| LEFT JOIN | all rows from the left table + matches (NULLs where none) |
| RIGHT JOIN | all rows from the right table + matches (NULLs where none) |
| CROSS JOIN | Cartesian product (every left row × every right row) |
SELECT c.firstName, c.lastName
FROM Customer c JOIN Ticket t ON c.customerID = t.customerID
WHERE t.ticketType = 'VIP';
A subquery is a query nested in another; the inner runs first. Use = for a single-row subquery, IN for a multi-row one.
SELECT eventName FROM Event
WHERE venueID = (SELECT venueID FROM Venue WHERE venueName = 'The Triffid');
DCL — permissions
GRANT SELECT, INSERT ON Orders TO clerk_role;
REVOKE INSERT ON Orders FROM clerk_role;
7 Security & administration
A role is a named bundle of privileges granted to users (simplifies admin).
- DAC (Discretionary Access Control) — the owner grants privileges directly.
- RBAC (Role-Based Access Control) — privileges are assigned via roles.
CREATE ROLE clerk_role;
GRANT SELECT, INSERT ON Orders TO clerk_role;
GRANT clerk_role TO user;
Transactions (ACID)
A transaction is a logical unit of work — a sequence of operations treated as one all-or-nothing action. It must be ACID:
| Property | Guarantee |
|---|---|
| Atomicity | all operations happen, or none do |
| Consistency | moves the DB from one valid state to another |
| Isolation | concurrent transactions don't interfere (appear serial) |
| Durability | once committed, changes survive crashes |
TCL commands: COMMIT (save), ROLLBACK (undo), SAVEPOINT (partial-rollback marker).
8 Exam technique
It's open book (unlimited paper notes, no digital), 2 hours, on-campus and handwritten, covering the whole semester. The lecturer warns: be selective — don't waste time flicking through too many pages.
- Section A (MCQ): do these first, fast — they're 1 mark each. Don't overthink.
- Section B (short answer): writing SQL, drawing ORM, doing RMaps, normalising, ethics. Show working — partial credit is real.
- ORM: always start from elementary facts, then add UC bars, then mandatory dots. Reject any "fact" with and/or/not/if/all/some.
- RMap: apply rules 1→2→3→4 in order; underline PKs, bracket optional columns, point FKs at PKs.
- Normalisation: name the partial deps (→2NF) and transitive deps (→3NF) explicitly, then give the split tables.
- SQL "will it run?": check entity integrity (duplicate PK/unique) and referential integrity (FK with no parent).
- Ethics: for CARE, define the letter → cite the facts → conclude.