// Remove fullscreen button from SageCell.
Set Operations

A set is basic concept in mathematics and has no direct definition.

A set is a collection of some items (elements). We often use capital letters to denote a set. To define a set we can simply list all the elements in curly brackets, for example to define a set A that consists of the two elements ♣ and , we write A = { ♣, }. To say that belongs to A, we write A, where "∈" is pronounced "belongs to." To say that an element does not belong to a set, we use ∉. For example, we may write A. So a set is a collection of things (elements).

Note that ordering does not matter, so two sets { ♣, } and { , ♣ } are equal. We often work with sets of numbers. Some important sets are given the following example.

Example: We list some sets that will be used later.

We can also define a set by mathematically stating the properties satisfied by the elements in the set. In particular, we may write

\[ A = \left\{ x \, | \, x \mbox{ satisfies some property } \right\} \]
or
\[ A = \left\{ x \, : \, x \mbox{ satisfies some property } \right\} . \]
The symbols "|" and ":" are pronounced "such that." It is a custom to denote the set of integers [a..b] between a and b inclusive. For example, [1..5] = { 1, 2, 3, 4, 5 }.
Example: Here are some examples of sets defined by stating the properties satisfied by the elements:
Set A is a subset of set B if every element of A is also an element of B. We write AB where "⊆" indicates "subset." Equivalently, we say B is a superset of A, or BA. If A is a subset of B, but A is not equal to B (i.e., there exists at least one element of B which is not an element of A), then A is also a proper (or strict) subset of B; this is written as AB.
Example: Here are some examples of sets and their subsets:
Two sets are equal if they have the exact same elements. Thus, A=B if and only if AB and BA.
For example, {3,4,7}={7,3,4}, and {a,a,b}={a,b}.
The set with no elements, i.e., ∅={} is the null set or the empty set. For any set A, ∅⊂A. The universal set is the set of all objects that we could possibly consider in the context we are studying. Thus every set A is a subset of the universal set. There is no standard notation for the universal set of a given set theory. Common symbols include S, Σ, and Ω.

Russell's paradox prevents the existence of a universal set and other set theories that include Zermelo's axiom of comprehension. We will use this word in more restrictive sense. In probability theory, the universal set is called the sample space, and usually is denoted as Ω.

The union of two sets A and B is the set of elements which are in A, in B, or in both A and B. In symbols,
\[ A \cup B = \left\{ x\,:\, x\in A, \ x\in B \right\} . \]
The intersection AB of two sets A and B is the set that contains all elements of A that also belong to B (or equivalently, all elements of B that also belong to A), but no other elements.

Similarly we can define the union of three or more sets. In particular, if A1, A2, ..., An, are n sets, their union A1A2 ∪... ∪ An is a set containing all elements that are in at least one of the sets. We can write this union more compactly by

\[ \cup_{i=1}^n A_i . \]
The complement of a set A, denoted by Ac or Ā or A' is the set of all elements that are in the universal set but are not in A.
The difference (subtraction) is defined as follows. The set AB consists of elements that are in A but not in B.
Two sets A and B are mutually exclusive or disjoint if they do not have any shared elements; i.e., their intersection is the empty set, A∩B=∅. More generally, several sets are called disjoint if they are pairwise disjoint, i.e., no two of them share a common elements.

Theorem: (De Morgan's law) For two sets A and B

Theorem: (Distributive law) For any three sets A, B and C

Example: If the universal set is given by Ω = { 1, 2, 3, 4, 5, 6, 7, 8, 9 } and A = { 1, 2, 3, 4, 5 }, B = { 2, 4, 6, 8 } , C = { 2, 3, 6, 7, 9 }. Then
The cardinality of a set A is a measure of the "number of elements of the set", which is denoted by |A|.

Theorem: (Inclusion-exclusion principle:)

In is very convenient to illustrate interralation of sets using Venn diagrams These diagrams depict elements as points in the plane, and sets as regions inside closed curves. A Venn diagram consists of multiple overlapping closed curves, usually circles, each representing a set. Venn diagrams were conceived around 1880 by the English mathematician, logician and philosopher John Venn (1834--1923). Venn himself did not use the term "Venn diagram" and referred to his invention as "Eulerian Circles." The term "Venn diagram" was later used by the American academic philosopher Clarence Irving Lewis in 1918, in his book "A Survey of Symbolic Logic."

Venn diagrams can be created in R using code written as part of the Bioconductor Project. To install limma from the R command line, type

 
source("http://www.bioconductor.org/biocLite.R")
biocLite("limma")
biocLite("statmod")
The next step in the installation is a call to the biocLite function
 
class(biocLite)
biocLite("limma")
The output from these calls indicates the installation of the limma package.
 
library(limma)
We can now use the commands in this package for generating Venn diagrams. The data needed for a Venn diagram consists of a set of binary variables indicating membership. We will be using the hsb2 (https://stats.idre.ucla.edu/hsb2-3.csv) dataset from the Institute for Digital Research and Education consisting of data from 200 students including scores from writing, reading, and math tests. We will create indicators for “high” values in each of these variables and generate Venn diagrams that tell us about the degree of overlap in high math, writing, and reading scores.
Next, we can use the vennCounts command to impose the structure needed to generate the Venn diagram.
 
a <- vennCounts(c3)
a
We can now generate our Venn diagram with the vennDiagram command:
 
vennDiagram(a)
While some of the options for the vennDiagram command are specific to tests run on microarray data, we can change some of the formatting. Below, we add names to the groups, we change the relative size of the labels and counts, and we opt for the counts to appear in red.
 
vennDiagram(a, include = "both", 
  names = c("High Writing", "High Math", "High Reading"), 
  cex = 1, counts.col = "red")
One can make other plots:
 
hsb2 <- read.table('https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb2-2.csv', header=T, sep=",")
attach(hsb2)
library(lattice)
#defining ses.f to be a factor variable
hsb2$ses.f = factor(hsb2$ses, labels=c("low", "middle", "high")) 
#histograms
histogram(~write, hsb2)
 
#conditional plot
histogram(~write | ses.f, hsb2)
Density plot:
 
densityplot(~socst, hsb2)
 
#conditional plot
densityplot(~socst | ses.f, hsb2)
Quantile-quantile plots
 
qqmath(~write, hsb2)
#conditional plot
qqmath(~write | ses.f, hsb2)
Box and whiskers plots
 
bwplot(~math, hsb2)

There is a special package VennDiagram available since 2018 by Hanbo Chen. Let us start with drawing a simple circle or ellipse:

Next, we add colors and remove the outline.
 
grid.newpage()
draw.single.venn(22, category = "Dog People", lty = "blank", fill = "cornflower blue", 
    alpha = 0.5)
Creating a Venn Diagram with two circles
 
grid.newpage()
draw.pairwise.venn(area1 = 22, area2 = 20, cross.area = 11, category = c("Dogs", 
    "Cats"))
Adding colour & moving titles
 
grid.newpage()
draw.pairwise.venn(22, 20, 11, category = c("Dogs", "Cats"), lty = rep("blank", 
    2), fill = c("light blue", "pink"), alpha = rep(0.5, 2), cat.pos = c(0, 
    0), cat.dist = rep(0.025, 2))
or
 
grid.newpage()
venn.plot <- draw.pairwise.venn(area1           = 100,
                                area2           = 70,
                                cross.area      = 68,
                                category        = c("First", "Second"),
                                fill            = c("blue", "red"),
                                lty             = "blank",
                                cex             = 2,
                                cat.cex         = 2,
                                cat.pos         = c(285, 105),
                                cat.dist        = 0.09,
                                cat.just        = list(c(-1, -1), c(1, 1)),
                                ext.pos         = 30,
                                ext.dist        = -0.05,
                                ext.length      = 0.85,
                                ext.line.lwd    = 2,
                                ext.line.lty    = "dashed"
                               )
We remove scaling
 
grid.newpage()
draw.pairwise.venn(22, 20, 11, category = c("Dog People", "Cat People"), lty = rep("blank", 
    2), fill = c("light blue", "pink"), alpha = rep(0.5, 2), cat.pos = c(0, 
    0), cat.dist = rep(0.025, 2), scaled = FALSE)
 
We make two non-overlapping circles
grid.newpage()
draw.pairwise.venn(area1 = 22, area2 = 6, cross.area = 0, category = c("Dog People", 
    "Snake People"), lty = rep("blank", 2), fill = c("light blue", "green"), 
    alpha = rep(0.5, 2), cat.pos = c(0, 180), euler.d = TRUE, sep.dist = 0.03, 
    rotation.degree = 45)
Creating a Venn Diagram with three circles
 
grid.newpage()
draw.triple.venn(area1 = 22, area2 = 20, area3 = 13, n12 = 11, n23 = 4, n13 = 5, 
    n123 = 1, category = c("Dogs", "Cats", "Lizards"), lty = "blank", 
    fill = c("skyblue", "pink1", "mediumorchid"))
Let's speed up the nrow(subset(…)) process for the area counts. This “likes” function finds the total area for a circle or overlap subset, how many people like those animals. It takes the first letter(s) of the animal(s) in lower case, e.g. c(“d”, “c”)
 
likes <- function(animals) {
    ppl <- d
    names(ppl) <- c("p", "d", "c", "s", "l")
    for (i in 1:length(animals)) {
        ppl <- subset(ppl, ppl[animals[i]] == T)
    }
    nrow(ppl)
}

# How many people like dogs?
likes("d")
 
grid.newpage()
venn.plot <- draw.triple.venn(area1           = 4,
                              area2           = 3,
                              area3           = 4,
                              n12             = 2,
                              n23             = 2,
                              n13             = 2,
                              n123            = 1,
                              category        = c('A', 'B', 'C'),
                              fill            = c('red', 'blue', 'green'),
                              cat.col         = c('red', 'blue', 'green'),
                              cex             = c(1/2,2/2,3/2,4/2,5/2,6/2,7/2),
                              cat.cex         = c(1,2,3),
                              euler           = TRUE,
                              scaled          = FALSE
                             )

A different package: venneuler which is a lot cleaner to code, it can take the actual dataset and work out the areas itself. However, you have to install this package first along with rJava:

 
install.packages("venneuler")
install.packages(rJava)
You download Java 64-bit (you should choose) from this page: https://www.java.com/en/download/manual.jsp

The package eulerr plots Venn diagrams. It is quite similar to venneuler but without its inconsistencies.

 
library(eulerr)
fit <- euler(c(A = 450, B = 1800, "A&B" = 230))
plot(fit)
One can use the venn() function from the gplots package: http://www.inside-r.org/packages/cran/gplots/docs/venn
 
require(gplots) 
## construct some fake gene names..
oneName <- function() paste(sample(LETTERS,5,replace=TRUE),collapse="")
geneNames <- replicate(1000, oneName())

## 
GroupA <- sample(geneNames, 400, replace=FALSE)
GroupB <- sample(geneNames, 750, replace=FALSE)
GroupC <- sample(geneNames, 250, replace=FALSE)
GroupD <- sample(geneNames, 300, replace=FALSE)

venn(list(GrpA=GroupA,GrpB=GroupB,GrpC=GroupC,GrpD=GroupD))