The post Game Theory For Competitive Programming appeared first on StepUp Analytics.

]]>Very few Competitive Programmers are aware of Game Theory. The reason is lack of good resources on the internet about the Game Theory. But don’t worry, through this blog you will clear you’re all the doubts related to game theory.

This topic is more of an intuitive topic. I shall try my best to develop your intuition in the same.

**Combinatorial games** are two-person games with perfect information and no chance moves (no randomization like coin toss is involved that can affect the game). These games have a win-or-lose or tie outcome and determined by a set of positions, including an initial position, and the player whose turn is to move.

Player moves from one position to another, with the players usually alternating moves, until a terminal position is reached. A terminal position is one from which no moves are possible. Then one of the players is declared the winner and the other the loser, or there is a tie (Depending on the rules of the combinatorial game, the game could end up with a tie).

The only thing that can be stated about the combinatorial game is that the game should end at some point and should not be stuck in a loop. But one of the looping game is a game like **chess**

In order to prevent such looping situation in chess (consider the case of both the players just moving their queen’s to-and-fro from one place to the other), there is actually a **“50-move rule”** according to which the game is considered to be drawn if the last 50 moves by each player have been completed without the movement of any **pawn** and without any **capture**. Source: Stackexchange.

Especially the coding part of Combinatorial Game Theory (CGT) is relatively very small and easy. The key to the **Game Theory **problems is that hidden observation, which can be sometimes very hard to find.

Some of the following games those come under the category of **Combinatorial Game Theory**:

- Chess game.
- Tic-Tac-Toe.
- Game of Nim.

I know that you are very well aware of both the __first__ and __second__ games, but you are thinking about the third game.

**What is this game?**

**How to play this game?**

But don’t worry I will clear your all doubts later in this Blog. Let us leave that for now and move forward. We can divide combinatorial games into **two** categories as shown below:

**Impartial Games:
**In impartial Games, the possible moves from any position of the game are the

**Partisan Games:
**In Partisan Games the possible moves from any position of the game are

Let’s understand these Games (Impartial and Partisan) with an Example one by one.

**1. **Given a number of piles in which each pile contains some numbers of stones/coins. In each turn, the player chooses one pile and remove any number of stones (at least one) from that pile. The player who cannot move is considered to lose the game (i.e., one who takes the last stone is the winner).

As it can be clearly seen from the rules of the above game that the moves are the same for both the players. There is no restriction on one player over the other. Such a game is considered to be an impartial Game.

The above-mentioned game is famous by the name ** Game of Nim** which will be discussed in detail later in this blog.

**2. **Let us take an example of Chess Game in this game, one player can only move the black pieces and the other one can only move the white ones. Thus, there is a restriction on both the players. Their set of moves are different and hence such a game is classified under the category of Partisan Games.

Partisan Games are much harder to analyze than Impartial Games as in such games we can’t use **the Sprague-Grundy Theorem** (will explain later in this blog).

Now, we already know what is **Game of Nim **(given in the previous section).

Here, I will explain to you how to solve the problem of (**Game of Nim**) in the Competitive Programming.

Here, I will take an **example**, consider that there are two players- **Alice **and **Bob**, and initially there are three piles of coins having **3, 4, 5 **coins in each of them as shown below. We assume that first move is made by **A**. See the below figure for the clear understanding of the whole gameplay.

Here, Both **Alice** and** Bob** are expert in this game, they will not do any mistake during the game.

In this game, we will take both scenarios, when Alice takes the first move or Bob takes the first move.

**Alice makes the first move:**

Here, Alice means **A** and Bob means **B**

**Bob makes the first move:**

Here, Alice means **A** and Bob means **B**

After seeing both figures, it must be clear that the game depends on one important factor – **Who starts the game first? **

Here, one question may come to your mind. **Does the player who starts first will win every time?**

** **Let us again play the game, starting with **Alice**, and this time with a different initial configuration of piles.

The piles have **1, 4, 5** coins initially.

Will **Alice** win again as he has started first? Let us see.

Here, we can see in the figure, **Alice** has lost. But how? We know that this game depends heavily on which player starts first. Thus, there must be another factor which dominates the result of this simple-yet-interesting game. That factor is the initial configuration of the **stones/piles**. This time the initial configuration was different from the previous one.

So, we can conclude that this game depends on two factors:

- The player who starts first.
- The initial configuration of the piles/heaps.

**But wait. How to solve this problem, how to find the winner of this game, when this problem comes Competitive Programming.**

**In fact, we can predict the winner of the game before even playing the game! This helps the Competitive Programmer to solve this problem.**

To solve this problem, we need to calculate the __Nim sum.__

__Nim sum__**: **The cumulative XOR value of the number of coins/stones in each pile/heaps at any point of the game is called Nim-Sum at that point.

*“If both Alice and Bob play optimally (i.e.- they don’t make any mistakes), then the player starting first is guaranteed to win if the Nim-Sum at the beginning of the game is non-zero. Otherwise, if the Nim-Sum evaluates to zero, then player Alice will lose definitely.”*

**For the proof of the above theorem, see: Wikipedia**

Let us apply the above theorem in the games played above. In the first game, **Alice** started first and the** Nim-Sum** at the beginning of the game was, **3 XOR 4 XOR 5 = 2**, which is a non-zero value, and hence **Alice** won. Whereas in the second game-play, when the initial configuration of the piles was 1, 4, and 5 and **Alice** started first here **Nim sum**, **1 XOR 4 XOR 5 = 0**, through the above theorem **Alice **will **Lose the game.**

**C++ implementation of the above Theorem:**

But Competitive Programming is not a sport for kids, in good programming contests, you will not find **Game Theory** problems as simple as above. To solve good problems, I will cover some important topics in **Game Theory** below.

**Grundy Number** is a number that defines a state of a game. We can define any **impartial game** (example: **nim game**) in terms of Grundy Number.

**Grundy Numbers or Nimbers** determine how any Impartial Game (**not only the Game of Nim**) can be solved once we have calculated the Grundy Numbers associated with that game using **Sprague-Grundy Theorem (**will explain later in this blog**)**.

But before calculating Grundy Numbers, we need to learn about another term-** Mex.**

**What is** **Mex****?
**

The **Grundy Number/ number** is equal to 0 for a game that is lost immediately by the first player and is equal to Mex of the** numbers** of all possible next positions for any other game.

Below are **three example** games and **programs** to calculate **Grundy Number** and **Mex** for each of them. Calculation of Grundy Numbers is done basically by a recursive function called as **calculate_Grundy()** function which uses **calculate_Mex()** function as its sub-routine.

Through these examples, you will able to know that how **Grundy Numbers** and **Mex **is helpful to solve the problems.

**Example 1
**The game starts with a pile of

**Answer:
**Since if the first player has 0(n=0) stone, he will lose immediately, so

** **If a player has 1 stone, then he can take all the stones and win. So the next possible position of the game (for the other player) is (0) stones.

Hence, **Grundy (1)** = **Mex (0)** = **1 [**According to the definition of Mex]

Similarly, if a player has 2 stones, then he can take only 1 stone or he can take all the stones and win. So the next possible position of the game (for the other player) is **(1, 0)** stones respectively.

Hence,** Grundy (2) = Mex (0, 1) = 2** [According to the definition of Mex]

Similarly, if a player has ‘n’ stones, then he can take only 1 stone, or he can take 2 stones……. or he can take all the stones and win. So the next possible position of the game (for the other player) is (n-1, n-2,.1) stones respectively.

Hence, **Grundy(n) = Mex (0, 1, 2, …. n-1) = n** [According to the definition of Mex]

**We summarize the first the Grundy Value from 0 to 10 in the below table:**

**Optimized Dynamic Programming Code in (C++)**:

**Example 2:
**The game starts with a pile of n stones, and the player to move may take any positive number of stones up to

**Answer:
**Since if the first player has 0 stones, he will lose immediately, so

If a player has 1 stone, then he can take all the stones and win. So the next possible position of the game (for the other player) is (0) stone

Hence,** Grundy (1) = Mex (0) = 1 **[According to the definition of Mex]

Similarly, if a player has 2 stones, then he can take only 1 stone or he can take 2 stones and win. So the next possible position of the game (for the other player) is (1, 0) stones respectively.

Hence, **Grundy (2) = Mex (0, 1) = 2** [According to the definition of Mex]

Similarly, **Grundy (3) = Mex (0, 1, 2) = 3** [According to the definition of Mex]

**But what about 4 stones?**

If a player has 4 stones, then he can take 1 stone or he can take 2 stones or 3 stones, but he can’t take 4 stones (see the constraints of the game). So the next possible position of the game (for the other player) is (3, 2, 1) stones respectively.

Hence, **Grundy (4) = Mex (1, 2, 3) = 0** [According to the definition of Mex]

So we can define Grundy Number of any n >= 4 recursively as-

**Grundy(n) = Mex [Grundy (n-1), Grundy (n-2), Grundy (n-3)]**

We summarize the first the Grundy Value from 0 to 10 in the below table-

**Optimized Dynamic Programming Code in (C++)**:

**Example 3:
**The game starts with a number- ‘n’ and the player to move divides the number- ‘n’ with 2, 3 or 6 and then takes the

**Answer:
**Suppose, we take n=7, Now the first player can divide the n with (2,3 or 6).

If first player divide n by 2 n=floor(n/2), n=3

If first player divide n by 3 n=floor(n/2), n=2

If first player divide n by 6 n=floor(n/2), n=1

Then for the second player n could be 3,2 or 1.

So **Grundy (7) =Mex (1,2,3) =0** [According to the definition of Mex]

We summarize the first the **Grundy Value** from 0 to 10 in the below table:

**Optimized Dynamic Programming Code in (C++)**:

Above we have learned how to find **Grundy Numbers **through the examples. For solving **tough problems,** we have to learn **(Sprague – Grundy Theorem).**

Suppose there is a composite game (more than one sub-game) made up of N sub-games and two players, Alice and Bob. Then **Sprague-Grundy Theorem **says that if both Alice and Bob play optimally (i.e., they don’t make any mistakes), then the player starting first is guaranteed to win if the XOR of the **Grundy numbers** of position in each **sub-games** at the beginning of the game is **non-zero**. Otherwise, if the XOR evaluates to zero, then player A will lose definitely, no matter what.

**How to apply Sprague Grundy Theorem?
**We can apply the Sprague-Grundy Theorem in any

- Break the composite game into sub-games.
- Then for each sub-game, calculate the Grundy Number at that position.
- Then calculate the XOR of all the calculated Grundy Numbers.
- If the XOR value is non-zero, then the player who is going to make the turn (First Player) will win else he is destined to lose, no matter what.

Now, we take an example and understand how to apply **Sprague Grundy Theorem** to find the winner, we will follow every four steps one by one.

**Example:
**The game starts with 3 piles having 3, 4 and 5 stones, and the player to move may take any positive number of stones up to 3 only from any of the piles [Provided that the pile has that much amount of stones]. The last player to move wins. Which player wins the game assuming that both players play optimally?

**Answer: **we will follow each step.

**First Step:** The sub-games can be considered as each pile.

**Second Step:** We see from the below table that

We have already seen how to calculate the Grundy Numbers of this game above in this blog.

**Grundy(3)=3
Grundy(4)=0
Grundy(5)=1**

**Third Step: **The XOR of 3, 4, 5 = 2.

**Fourth Step: **Since XOR is a non-zero number, so we can say that the** first player will win**.

**C++ program that implements above all four steps:**

**References: Wikipedia
**I will explain to you one more very good Problem based on

**Example (composite game):
**N x N chessboard with K knights on it. Unlike a knight in a traditional game of chess, these can move only as shown in the picture below (so the sum of coordinates is decreased in every move). There can be more than one knight on the same square at the same time. Two players take turns moving and when it is a player’s, turn he chooses one of the knights and moves it. A player who is not able to make a move is declared the loser.

**Answer:
**This is the same as if we had K chess boards with exactly one knight on every chessboard. This is the ordinary sum of K games and it can be solved by using the Grundy numbers. We assign Grundy number to every subgame according to which size of the pile in the

Here, Pseudocode for generating **Grundy numbers for each position on the ChessBoard.**

int grundy_Number(position pos) { moves[] = possible positions to which I can move from pos set s; for (all x in moves) insert into s grundy_Number(x); //return the smallest non-negative integer not in the set s; int ret=0; while (s.contains(ret)) ret++; return ret; }

**How to find the Grundy numbers in this game?
** We use the same concept to find the

Grundy number(m) for each position on chess board:

Suppose you are calculating for the position x.

**G(m)= Mex(G(X1), G(X2) …. G(xm))**

Where m= {number of possible moves from position x of the knight}.

G(X1), G(X2), G(X3) …. and G(xm) are the **Grundy Numbers **for all the position where a knight can move from x. These Grundy numbers are already calculated by you.

The following table shows Grundy numbers for an **8 x 8 **board:

A better approach is to compute Grundy numbers for an **N X N** chessboard in **O(n^2)** time and then XoR these **K** (one for every horse) values. If their xor is 0 then we are in a losing position, otherwise, we are in a winning position.

**Why is the pile of Nim equivalent to the subgame if its size is equal to the Grundy number of that subgame?**

- If we decrease the size of the pile in Nim from A to B, we can move also in the sub-game to the position with the Grundy number B. (Our current position had Grundy number A so it means we could move to positions with all smaller Grundy numbers, otherwise the Grundy number of our position would not be A.)
- If we are in the subgame at a position with a Grundy number higher than 0, by moving in it and decreasing its Grundy number we can also decrease the size of the pile in the Nim.
- If we are in the subgame at the position with Grundy number 0, by moving from that we will get to a position with a Grundy number higher than 0. Because of that, from such a position it is possible to move back to 0. By doing that we can nullify every move from the position from Grundy number 0.

**Other composite games:
**It doesn’t happen often, but you can occasionally encounter games with a slightly different set of rules. For example, you might see the following changes:

**Q.** When it is a player’s move he can choose some of the horses (at least one) and move with all the chosen ones?

**Solution:** You are in a losing position if and only if every horse is in a losing position on his own chess board (so the Grundy number for every square, where the horse is, is 0).

**Problems for Practice:**

GAME3 — Yet Another Fancy Game [SPOJ]

GAME31 — The game of 31 [SPOJ]

Advanced details in competitive programming, Check my GitHub repo:

Awesome-competitive-programming

Happy coding

The post Game Theory For Competitive Programming appeared first on StepUp Analytics.

]]>The post Competitive Programming: Algorithms and Data Structure appeared first on StepUp Analytics.

]]>- Basic Mathematics
- Data Structures (Beginner to Expert)
- Algorithms.

This blog is the continuation of How to start with Competetive Programming

In each part, I will introduce you with important concepts used in competitive programming (will not go in detail) and will provide a good reference to read these topics in detail.

If you want to be a serious competitive programmer. You should have knowledge of some Mathematical concepts and good command on number theory.

**Number theory
**In number theory, there are many Concepts. Let me introduce you with all these one by one, and that would save a lot of time and efforts while programming in the contests.

**1. Modular arithmetic** When one number is divided by another, the modulo operation finds the remainder. It is denoted by the % symbol.

* Example* Assume that you have two numbers 9 and 2. 9%2 is 1 because when 9 is divided by 2, the remainder is 1. More details visit this: Modular arithmetic

**2. Modular exponentiation** Exponentiation is a mathematical operation that is expressed as (x^n) and computed as x^n = x*x*…*x (n times). But Modular exponentiation In this operation, given three numbers x, y, and p, is competed as compute (x^y) % p.

**Example:**

Input: x = 2, y = 3, p = 5

Output: 3

Explanation: 2^3 % 5 = 8 % 5 = 3.

More details visit this: Modular exponentiation

**3. Greatest Common Divisor (GCD)**

The GCD of two or more numbers is the largest positive number that divides all the numbers that are considered.

**Example:
**The GCD of 20 and 12 is 4 because it is the largest positive number that can divide both 20 and 12.

More details visit this: GCD

**4. Euclidean algorithm**

The idea behind this algorithm is GCD(A, B)=GCD(B, A%B). It will recurse until A%B=0.

**5. Extended Euclidean algorithm
**This algorithm is an extended form of Euclid’s algorithm. GCD(A, B) has a special property so that it can always be represented in the form of an equation i.e. Ax+By=GCD(A, B).

The coefficients (x and y) of this equation will be used to find the modular multiplicative inverse. The coefficients can be zero, positive or negative in value. This algorithm takes two inputs as A and B and returns GCD(A, B) and coefficients of the above equation as output.

Example If A=30 and B=20,

then 30∗(1)+20∗(−1)=10 where 10 is the GCD of 20 and 30.

More details visit this: Extended Euclidean algorithm

**6. Modular multiplicative inverse**

What is a multiplicative inverse? If A.B=1, you are required to find B such that it satisfies the equation. The solution is simple. The value of B is 1/A. Here, B is the multiplicative inverse of A.

What is modular multiplicative inverse? If you have two numbers A and M, you are required to find B such it that satisfies the following equation: (A.B)%M=1 Here B is the modular multiplicative inverse of A under modulo M.

More details visit this: Modular multiplicative inverse

**7. Sieve of Eratosthenes**

Given a number n, print all primes smaller than or equal to n. It is also given that n is a small number. The sieve of Eratosthenes is one of the most efficient ways to find all primes smaller than n when n is smaller than 10 million.

**Example:**

Input : n = 20

Output: 2 3 5 7 11 13 17 19

More details visit this: Sieve of Eratosthenes

**8. Euler’s Totient Function
**Euler’s Totient function fun(n) for an input n is a count of numbers in {1, 2, 3, …, n} that are relatively prime to n, i.e., the numbers whose GCD (Greatest Common Divisor) with n is 1.

**Example:**

fun(6) = 2

gcd(1, 6) is 1 and gcd(5, 6) is 1.

More details visit this: Euler’s Totient Function

**9. Convex Hull**

Given a set of points in the plane. the convex hull of the set is the smallest convex polygon that contains all the points of it.

More details visit this: Convex Hull

Which data structure you will use, that depends on the problem you are trying to solve. If a problem is mapped to the most efficient data-structure which captures the essence of that problem, then it leads to an elegant solution to the problem.

The “right” choice of data-structure would not only depend on the representation of the inputs but the query it is supposed to be optimal for. E.g if asked to find a number among the list of number efficiently, then BST(Binary Search Tree) is a choice which would effectively represent the input data for the set of all point search queries.

If the query was for a range of numbers, and not just a single number, then BST is no longer the optimal choice but the data-structure to choose is maybe B+ Tree.

Here I will categorize the all-important data structures for different – different competitive programming skill level.

**Beginner:**

1. Linked List

2. Stack

3. Queue

4. Binary Search Tree

**Intermediate:**

1. Heap

2. Priority Queue

3. Huffman Tree

4. Union-Find

5. Trie

6. Hash Table

7. TreeMap

**Proficient :**

1. Segment Tree

2. Binary Indexed Tree

3. Suffix Array

4. Sparse Table

5. Lowest Common Ancestor

6. Range Tree

**Expert:**

1. Suffix Automaton

2. Suffix Tree

3. Heavy Light Decomposition

4. Treap

5. Aho-Corasick Algorithm

6. K Dimensional Tree

7. Link-Cut Tree

8. Splay Tree

9. Palindromic Tree

10. Ropes Data Structure

11. Dancing Links

12. Radix tree aka Prefix tree

13. Dynamic Suffix Array

I have seen all of the listed data structures being used in various programming contests.

Many of them are given in language libraries. But it is very important to understand their dynamics. Otherwise, understanding related higher-level structures will be difficult (if possible).

One may find some higher level data structures easier than lowers (happened to me).

Those programmers use c++ language for their competitive programming they can use some of data structures in STL.

1. Vector

2. List

3. Deque

4. Queue

5. Priority_queue

6. Stack

7. Set

8. Multiset

9. Map

10. Multimap

To be a good competitive programmer you must have a good understanding of the algorithms and the way your code works. The best algorithms are the ones which are small (fewer lines of code) and efficient.

You can develop your mind in building great algorithms by reading the code and practicing writing code.

Here I will introduce you with some of the standard algorithms that we use in competitive programming.

**Searching algorithms**

- Linear Search.
- Binary Search.
- Jump Search.
- Interpolation Search.
- Exponential Search.
- Ternary Search.

**Sorting algorithms**

- Selection Sort
- Bubble Sort
- Insertion Sort
- Merge Sort
- Heap Sort
- QuickSort
- Radix Sort
- Counting Sort
- Bucket Sort
- ShellSort
- Comb Sort
- Pigeonhole Sort
- Cycle Sort.

**Greedy Algorithms
**A greedy algorithm is an algorithm that always makes a choice that seems best “right now”, without considering the future implications of this choice.

Greedy Algorithm as the name itself implies is an algorithm that is always greedy in taking decisions at each step of process, i.e. it chooses the best solution (either maximum or minimum / known as local optimum in technical terms) at each step of process assuming that you end up with the best solution (known as global optimum in technical terms) for the whole problem in the end.

Here are some algorithms where the **Greedy approach** is used:

- Activity Selection Problem
- Kruskal’s Minimum Spanning Tree Algorithm
- Huffman Coding
- Efficient Huffman Coding for Sorted Input
- Prim’s Minimum Spanning Tree Algorithm
- Prim’s MST for Adjacency List Representation
- Dijkstra’s Shortest Path Algorithm
- Dijkstra’s Algorithm for Adjacency List Representation

**Pattern Searching Algorithms
**In Pattern searching algorithm we search the pattern that repeats one or more time in the sequence or string. Here I will introduce you with some efficient pattern searching algorithms that find a pattern in a particular sequence or string and finds the number occurrences of a pattern in that sequence or string in optimal time.

- Naive Pattern Searching
- KMP Algorithm
- Rabin-Karp Algorithm
- A Naive Pattern Searching Question
- Suffix Array
- Z algorithm (Linear time pattern searching Algorithm)
- Pattern Searching using a Trie of all Suffixes

**Graph Algorithms
**Some of the most famous graph algorithms are given below:

**Introduction DFS and BFS:**

- Graph and its representations
- Breadth First Traversal for a Graph
- Depth First Traversal for a Graph
- Applications of Depth First Search
- Detect Cycle in a Directed Graph
- Detect Cycle in an Undirected Graph
- Detect cycle in an undirected graph
- Longest Path in a Directed Acyclic Graph
- Topological Sorting
- Check whether a given graph is Bipartite or not
- Snake and Ladder Problem
- Biconnected Components
- Check if a given graph is tree or not

**Minimum Spanning Tree:**

- Prim’s Minimum Spanning Tree (MST))
- Applications of Minimum Spanning Tree Problem
- Prim’s MST for Adjacency List Representation
- Kruskal’s Minimum Spanning Tree Algorithm
- Boruvka’s algorithm for Minimum Spanning Tree

**Shortest Paths:**

- Dijkstra’s shortest path algorithm
- Dijkstra’s Algorithm for Adjacency List Representation
- Bellman-Ford Algorithm
- Floyd Warshall Algorithm
- Johnson’s algorithm for All-pairs shortest paths
- Shortest Path in Directed Acyclic Graph
- Some interesting shortest path questions
- Shortest path with exactly k edges in a directed and weighted graph

**Connectivity:**

- Find if there is a path between two vertices in a directed graph
- Connectivity in a directed graph
- Articulation Points (or Cut Vertices) in a Graph
- Biconnected graph
- Bridges in a graph
- Strongly Connected Components
- Biconnected Components

**Maximum Flow:**

- Ford-Fulkerson Algorithm for Maximum Flow Problem
- Find the maximum number of edge-disjoint paths between two vertices
- Maximum Bipartite Matching

**Dynamic Programming
**In Dynamic Programming, a problem is divided into sub-problems and the solutions of these sub-problems are combined together to reach an overall solution for the main problem. When using approaches like Divide-and-Conquer, a sub-problem may be solved multiple times. Divide-and-Conquer methods may have to perform more work in these cases.

Dynamic Programming solves each of these sub-problems just once and then saves it, thus reducing the number of computations by avoiding the work of recalculating it again at a later stage, where the solution for that sub-problem is required

Here are some of the most famous problems where dynamic programming is used.

- Overlapping Subproblems Property
- Optimal Substructure Property
- Longest Increasing Subsequence
- Longest Common Subsequence
- Edit Distance
- Min Cost Path
- Coin Change
- Matrix Chain Multiplication
- Binomial Coefficient
- 0-1 Knapsack Problem
- Egg Dropping Puzzle
- Longest Palindromic Subsequence
- Cutting a Rod
- Maximum Sum Increasing Subsequence
- Longest Bitonic Subsequence
- Floyd Warshall Algorithm
- Palindrome Partitioning
- Partition problem

**Backtracking Algorithms
**Backtracking = {track for the possible solution and return if it is true otherwise get back and so on}.

In backtracking, we start with one possible move out of many available moves and try to solve the problem, if we are able to solve the problem with the selected move then we will print the solution else we will backtrack and select some other move and try to solve it. If none of the moves work out, we claim that there is no solution to the problem.

Here are some most famous problems where Backtracking approach is used.

- Print all permutations of a given string
- The Knight’s tour problem
- Print all permutations of a given string
- The Knight’s tour problem
- Hamiltonian Cycle

Advanced details in competitive programming, Check my GitHub repo. Awesome-competitive-programming

The post Competitive Programming: Algorithms and Data Structure appeared first on StepUp Analytics.

]]>The post Anaplan – Cloud Based Business Planning Platform appeared first on StepUp Analytics.

]]>Anaplan is a cloud-based business enterprise, works on the single hub where we can create and use the business planning models. Anaplan is a new platform challenging existing tools in the market regarding corporate performance and management solutions. It is a cloud-based business modelling and planning platform, an alternative to ERP and Systems integrator business enterprises.

The main aim of Anaplan is providing utterly cloud-based platform to the financial services. The not only economic purpose it also provides solutions for workday and salesforce to hold the data as much as available.

The Anaplan solutions used to a built-in memory of 64 bit, multicore platform and delivered through a cloud-based platform. Anaplan has already started providing their services to some large and small enterprises like McAfee, Aviva, Kimberly_clarke and many similar firms.

Due to Anaplan’s flexibility, it can be used to serve the financial, commercial and planning operation model. These models easily integrate with other models. Anaplan models can be built from scratch and can be customized based on pre-delivery models like:

- Cash flow planning
- Balance sheet
- Financial consolidation
- Insurance premium modelling and claims analysis
- Traditional p&l
- Sales forecasting
- Territory planning
- Quota management etc.

Anaplan can change according to the market, trying to reduce the cost, risk, and increase the efficiency. This is possible when Anaplan brings all the s&op, supply planning, and demand management together, and enables the decision making throughout the supply chain. Anaplan can connect the other models like sales and finance through the network. Anaplan provides features, that can improve visibility and collaboration in our business with ease.

With the help of Anaplan Cloud-Based Platform, implementing following tasks are made easy

- Supply chain
- Strategic management
- Demand planning management
- Product portfolio management
- Sales and operation

When we compared the supply chain with old days now, it’s better. Years ago, many companies used ERP for their business tracking and collaboration. Later companies use the combination of ERP and planning spreadsheets. By using this spreadsheets lot of issues occur in the future as planning becomes messy. To overcome these type of problems, we can go with effective supply chain management, i.e., Anaplan.

Supply chain management define the cyclic process of goods and services, from the initial stage of raw material to the final stage of finished goods as products or services. This process deals with supply chain management activities like planning, execution, monitoring and controls the operations. In supply chain management, one of the best features of Anaplan is that it provides total visibility among the suppliers, distributors and production

team. Due to this, the purchase of products or services can be accurate and avoid the risk. Anaplan can also reduce the purchase cost of the product, prevent product shortages and prepared test scenarios executed without fail. This Anaplan’s Cloud-Based Platform connects manufacturers, suppliers, and distributors.

We know that supply chain management is crucial for any business/companies in providing user satisfied services. With the Anaplan, we can improve the following services in a better possible way:

- Providing advanced customer services
- Reducing the operational cost
- Improve financial status, etc.

Supply chain management makes sure that you deliver appropriate products to the customer while working with an array of products in varying quantities. It helps in providing the products to right place on right time. By the supply chain management, we can reduce the operating cost of a product which includes purchasing cost, production cost, and total supply chain cost. Thus reduces the value of fixed assets and increase the cash inflow of the business resulting high-profit leverage.

Supply chain management plays a significant role in Society as well, i.e., medical trips, natural disaster relief activities, etc. When an unexpected natural disaster occurs, the SCM team comes into action, and take the necessary steps in delivering products and services to the customers to make sure of things done on time.

Present day markets need an agile view for monitoring their business line. Anaplan platform provides this kind of look to the business team in tracking the product lifecycle, effects of new product introduction, product end life, rates and other. By Anaplan cloud-based platform all the enterprises can efficiently optimize their business line in achieving their organizational goals thus resulting in maximum profits and increase their business share in the market.

Anaplan Platform collects all the cross-functional business information. Using this information, the planner creates or develop new products, and predict outcomes of the products based on their performance and revenue. Remove the products that show not many results. According to the demand planning, enterprises avoid unexpected issues like natural disasters, labor problems and other.

With the Anaplan platform, you can easily connect the planning, business line, product innovation and trade partners. It also helps you to synchronize planning models(re calibrate planning, test scenarios, operational planning) to fluctuate business.

For every successful supply chain management, requires the correct foundation. With the capabilities of Anaplan strategic management, the business can create policies to multiple groups of end users and products. Anaplan provides connection among network, inventory and products, and customer segmentation. This can help you make rules and policies to support supply chain management.

For the next few generations of supply chain management, Anaplan provides the better models enabling transparency about the product delivery system, thus resulting from improved trust factor for the organization form the customers. Cloud based platform and agile view are currently most used features that are appreciated by the organizations that have already started using the Anaplan plan supply chain management.

The post Anaplan – Cloud Based Business Planning Platform appeared first on StepUp Analytics.

]]>The post 11 Reasons Why You Should Learn R Programming appeared first on StepUp Analytics.

]]>R programming is a **statistical** programming language developed by **scientists** that have open source libraries for statistics, machine learning, and data science. **R lends itself well to business because of its depth of topic-specific packages and its ****communication**** infrastructure**. R has packages covering a wide range of topics such as econometrics, finance, and time series.

R has best-in-class tools for visualization, reporting, and interactivity, which are as important to business as they are to science. Because of this, R is well-suited for scientists, engineers and business professionals.

- Business Capability (1 = Low, 10 = High)
- Ease of Learning (1 = Difficult, 10 = Easy)
- Cost (Free/Minimal, Low, High)
- Trend (0 = Fast Decline, 5 = Stable, 10 = Fast Growth)

**REASON 01: R IS OPEN-SOURCE AND FREELY AVAILABLE TOOL.**

Unlike SAS and Matlab, one can freely install, use, update, clone, modify, redistribute and resell R. This saves lots of money, but it also allows for easy upgrades, which is useful for a statistical programming language.

**REASON 02: R IS CROSS-PLATFORM AND OS COMPATIBLE TOOL.****
**R can be run on Windows, Mac OS X and Linux. It can also import data from Microsoft Excel, Microsoft Access, MySQL, SQLite, Oracle and many other programs as well.

**REASON 03: R IS A POWERFUL SCRIPTING LANGUAGE.**

As such, R can handle large, complex data sets. R is also the best language to use for heavy, resource-intensive simulations and it can be used on high-performance computer clusters.

**REASON 04: R HAS WIDESPREAD ACCLAIM.**** **

With an estimated 2 million users, R is one of the top programming languages of 2017.

**REASON 05: R IS HIGHLY FLEXIBLE AND EVOLVED.**

Many new developments in statistics first appear as R packages.

**REASON 06: PUBLISHERS LOVE R****
**R integrates easily with document preparation systems like LaTeX. That means statistical output and graphics from R can be embedded into word-processing documents.

**REASON 07: R HAS A HUGE, VIBRANT COMMUNITY AND RESOURCE BANK**

with a global community of passionate users who regularly interact on discussion forums and attend conferences. In addition, about 2000 free libraries are available for your unlimited use, covering statistical areas of finance, cluster analysis, high-performance computing and more.

**REASON 08: LEARNING R IS EASY WITH THE TIDYVERSE
**Learning R used to be a major challenge. Base R was a complex and inconsistent programming language. Structure and formality was not the top priority as in other programming languages. This all changed with the “tidy verse”, a set of packages and tools that have a consistently structured programming interface.

**REASON 09: R COMMUNITY SUPPORT **

Being a powerful language alone is not enough. To be successful, a language needs community support. We’ll hit on two ways that R excels in this respects: CRAN and the R Community.

**REASON 10: R HAS HEART **

We already talked about the infrastructure, the tidy verse, that enables the ecosystem of applications to be built using a consistent approach. It’s this infrastructure that brings life into your data analysis. The tidyverse enables:

- Data manipulation (dplyr, tidyr)
- Working with data types (stringr for strings, lubridate for date/datetime, forcats for categorical/factors)
- Visualization (ggplot2)
- Programming (purrr, tidyeval)
- Communication (Rmarkdown, shiny)

When tools such as dplyr and ggplot2 came to fruition, it made the learning curve much easier by providing a consistent and structured approach to working with data. As Hadley Wickham and many others continued to evolve R, the tidyverse came to be, which includes a series of commonly used packages for data manipulation, visualization, iteration, modelling, and communication. The end result is that R is now much easier to learn (we’ll show you in our next article!)

**REASON 10: R for Business**

**RMARKDOWN**

Rmarkdown is a framework for creating reproducible reports that have since been extended to building blogs, presentations, websites, books, journals, and more. It’s the technology that’s behind this blog, and it allows us to include the code with the text so that anyone can follow the analysis and see the output right with the explanation. What’s really cool is that the technology has evolved so much. Here are a few examples of its capability:

- rmarkdown for generating HTML, Word and PDF reports
- rmarkdown for generating presentations
- flexdashboard for creating web apps via the user-friendly Rmarkdown format.
- blogdown for building blogs and websites
- bookdown for creating online books
- Interactive documents
- Parameterized reports for generating custom reports (e.g. reports for a specific geographic segment, department, or segment of time)

REASON 11: The R community is awesome!

****************************************************************************************************************************

**How companies are using R**

- Ford uses R to improve the design of its vehicles.
- Basically, Twitter uses R to monitor user experience.
- The US National Weather Service uses R to predict severe flooding.
- The Human Rights Data Analysis Group uses R to quantify the impact of war.
- R is being used by The New York Times to create infographics.
- Google uses R to calculate the ROI of advertising campaigns.
- Facebook uses R to update Facebook status updates and its social network graph

**
WHAT SHOULD YOU DO?
**Don’t make the decision tougher than what it is. Think about where you are coming from:

**Are you a computer scientist or software engineer?** If yes, choose Python.

**Are you an analytics professional or mechanical/industrial/chemical engineer looking to get into data science?** If yes, choose R.

**Think about what you are trying to do:**

**Are you trying to build a self-driving car?** If yes, choose Python.

**Are you trying to communicate business analytics throughout your organization?** If yes, choose R.

**
R can also be used in a big data context**; You often hear that Scala and Python are great, and that is true, but you could also consider R when you’re working on visualization or data exploration; See this question and answers for more information – Is R considered unsuitable for Big Data when compared to Python?

Of course, tools like Mahout will always also be worth your time, and for the professional goals that you’re talking about, it’s an “and-and” story. My advice would be to check some companies and/or industries that you would like to work for and then see how much Mahout is actually used versus R for you to prioritize your learning.

The post 11 Reasons Why You Should Learn R Programming appeared first on StepUp Analytics.

]]>The post stringR package in R for Handling Strings appeared first on StepUp Analytics.

]]>**Suppose we want to count the length of the individual states**

Mainly four kinds of string manipulations can be performed by the functions incorporated in this package:

- Allow us to manipulate individual characters within a string in character vectors.
- Whitespace tools to add, remove, and manipulate whitespace.
- Locale-sensitive operations whose operations will vary from locale to locale.
- Pattern matching functions can recognize four engines of pattern description.

**Installation:**

install.packages('stringr', repos = 'https://cran.rstudio.com/bin/windows/contrib/3.4/stringr_1.3.0.zip')

library(stringr)

There are a large number of functions incorporated in this package. Few important of them are discussed below.

There are three functions that are used to convert the case of strings.

– This converts the entered string into upper case. The syntax is:**str_to_upper()**

**str_to_upper(string, locale = “en”)**

Here “en” stands for English, which is by default.

str_to_upper("I am a good girl") ## [1] "I AM A GOOD GIRL"

We see immediately all the characters in the strings turn into upper case.

– This converts the entered string into lower case. The syntax is:**str_to_lower()**

**str_to_lower(string, locale = “en”)**

Here “en” stands for English, which is by default.

str_to_lower("I am a good girl") ## [1] "i am a good girl"

– This converts the entered string into the proper case, in the sense, the first character in each term of the string is in capital and rest are in lower case. The syntax is:**str_to_title()**

**str_to_title(string, locale = “en”)**

Here “en” stands for English, which is by default.

str_to_title("I am a good girl") ## [1] "I Am A Good Girl"

This is used to concatenate multiple strings into a single string. The syntax is:

**str_c(…, sep = “”)**

The “sep” stands for the separator. It is used when we want to concatenate the strings keeping any kind of separator between each term.

str_c("I", "am", "a", "good", "girl") ## [1] "Iamagoodgirl"

Here we see that output is shown but it is too congested to read. No space is there between words. Now, to make it look much more readable, we include a separator.

str_c("I", "am", "a", "good", "girl", sep = " ") ## [1] "I am a good girl"

Now, this looks good.

This is used to find out the length of the input string, or in other words, the total number of characters in the string. To be noted that, all the spaces between words are also calculated. The syntax is :

**str_length(string)**

str_length("I am a good girl") ## [1] 16

This works well as the number of characters in the above string is 16, which includes spaces also.

This is used to count the number of occurrences of the specified pattern in the given string. The syntax is :

**str_count(string, pattern = “”)**

The pattern to be mentioned can be anything, characters, numbers or special characters.

str_count(c("apple", "pears", "orange", "banana"), pattern = "p")## [1] 2 1 0 0

Here we have entered names of four fruits and we have asked to count the number of times “p” occurs in these individual fruit names. The output shows there are 2 times “p” occurred in “apple”, 1 time in “pears” and did not occur anytime in the rest of the names, which is true.

A similar to the above function is this. The only difference here is that it returns output in boolean datatype. This function tells us whether the given strings contain the given pattern or not. Hence the output comes to be “True” or “False”. The syntax is:

**str_detect(string, pattern)**

The syntax remains the same as the previous one. Hence we keep the same example as before and try to identify the difference between both the cases.

str_count(c("apple", "pears", "orange", "banana"), pattern = "p") ## [1] 2 1 0 0

Here the output comes to be “True” or “False”,i.e., the first two strings contain the pattern and the rest don’t.

This does exactly the opposite work of str_c(). It splits the given string, by the separator given. The syntax is:

**str_split(string, pattern, n = Inf, simplify = FALSE)**

str_split("I am a good girl", " ") ## [[1]] ## [1] "I" "am" "a" "good" "girl"

Here we split the string by the space. So we get all the individual words.

- str_replace() – This replaces the first occurrence of the pattern by some other given pattern. All the other occurrences remain as it is. The syntax is:

**str_replace(string, pattern, replacement)**

str_replace("apple", "p", "b") ## [1] "abple"

We see only the first occurrence of “p” gets replaced by “b”, the next one remains.

- str_replace_all() – This replaces all the occurrence of the pattern by some other given pattern. The syntax is:

**str_replace_all(string, pattern, replacement)**

str_replace_all("apple", "p", "b") ## [1] "abble"

So we see that both the “p” got replaced by “b”.

This orders the given string in a certain order, either increasing or decreasing. The nature of it is to place the index numbers in order of the occurrence. The syntax is:

**str_order(x, decreasing = FALSE, na_last = TRUE, locale = “”, …)**

Here decreasing = False means that the order should not be decreasing, i.e., it should be increasing order.

str_order(c("apple", "pears", "orange", "banana"), decreasing = T) ## [1] 2 3 4 1

As the inputs are in strings so the arrangements would be in alphabetical basis. As the decreasing order is mentioned true, therefore “pears” will come first (because “p” comes first in decreasing) then “orange” will come, and so on. Accordingly, we see that the 2nd element comes first, next is the 3rd element, then 4th element and lastly, the first element. Indexing wise the output is shown.

It does the same work as order(), but the difference is that here the strings are shown instead of their indexes. This looks much more convenient to understand as it would be difficult to remember all the indexes when the input set is large. Returning the names itself in sorted form is shown here.

str_sort(c("apple", "pears", "orange", "banana"), decreasing = T) ## [1] "pears" "orange" "banana" "apple"

It pads the string with the given argument, by the number of times mentioned and to the side of the string mentioned. The syntax is:

**str_pad(string, width, side = c(“left”, “right”, “both”), pad = ” “)**

Here “width” denotes the number of times the “pad” element is to be repeated. Also “side” denotes on which side it is to be incorporated.

str_pad("abc", width = 5, side = "left", pad = " ") ## [1] " abc"

So here we see that 5 spaces are added to the left of abc, which was desired.

It does exactly the opposite of what pad() is used for. Here this function removes all the extra spaces around the string. The syntax is:

**str_trim(string, side = c(“both”, “left”, “right”))**

str_trim(" abc ", side = "both") ## [1] "abc"

So here we remove all the extra spaces that were present on both the sides of abc, which was desired.

Now let us import a data set and imply these string operations on the fields.

setwd("C:/Users/Prithac/Desktop/step up analytics") cust <- read.csv("Customer_Info.csv")

In this dataset, there is personal information of 50 customers, with respect to 5 fields. One by one I repeat the above-mentioned functions implying them on the current dataset.

1. Suppose we want to keep the State names in all capital letters,

cust$State <- str_to_upper(cust$State) head(cust) ## Customer State Education Gender ## 1 BU79786 WASHINGTON Bachelor F ## 2 QZ44356 ARIZONA Bachelor F ## 3 AI49188 NEVADA Bachelor F ## 4 WW63253 CALIFORNIA Bachelor M ## 5 HB64268 WASHINGTON Bachelor M ## 6 OC83172 OREGON Bachelor F ## EmploymentStatus ## 1 Employed ## 2 Unemployed ## 3 Employed ## 4 Unemployed ## 5 Employed ## 6 Employed

2. Suppose we want to concatenate the Education and Employment fields into one whole field, separated by “-”,

cust$Concat <- str_c(cust$Education, cust$EmploymentStatus, sep = "-") head(cust) ## Customer State Education Gender ## 1 BU79786 WASHINGTON Bachelor F ## 2 QZ44356 ARIZONA Bachelor F ## 3 AI49188 NEVADA Bachelor F ## 4 WW63253 CALIFORNIA Bachelor M ## 5 HB64268 WASHINGTON Bachelor M ## 6 OC83172 OREGON Bachelor F ## EmploymentStatus ## 1 Employed ## 2 Unemployed ## 3 Employed ## 4 Unemployed ## 5 Employed ## 6 Employed ## Concat ## 1 Bachelor- Employed ## 2 Bachelor- Unemployed ## 3 Bachelor- Employed ## 4 Bachelor- Unemployed ## 5 Bachelor- Employed ## 6 Bachelor- Employed

3. Suppose we want to count the length of the individual states,

cust$Leng <- str_length(cust$State) head(cust) ## Customer State Education Gender ## 1 BU79786 WASHINGTON Bachelor F ## 2 QZ44356 ARIZONA Bachelor F ## 3 AI49188 NEVADA Bachelor F ## 4 WW63253 CALIFORNIA Bachelor M ## 5 HB64268 WASHINGTON Bachelor M ## 6 OC83172 OREGON Bachelor F ## EmploymentStatus ## 1 Employed ## 2 Unemployed ## 3 Employed ## 4 Unemployed ## 5 Employed ## 6 Employed ## Concat Leng ## 1 Bachelor- Employed 10 ## 2 Bachelor- Unemployed 7 ## 3 Bachelor- Employed 6 ## 4 Bachelor- Unemployed 10 ## 5 Bachelor- Employed 10 ## 6 Bachelor- Employed 6

4. Suppose we want to match the pattern that the corresponding customers are Male,

cust$count <- str_count(cust$Gender, pattern = "M") head(cust) ## Customer State Education Gender ## 1 BU79786 WASHINGTON Bachelor F ## 2 QZ44356 ARIZONA Bachelor F ## 3 AI49188 NEVADA Bachelor F ## 4 WW63253 CALIFORNIA Bachelor M ## 5 HB64268 WASHINGTON Bachelor M ## 6 OC83172 OREGON Bachelor F ## EmploymentStatus ## 1 Employed ## 2 Unemployed ## 3 Employed ## 4 Unemployed ## 5 Employed ## 6 Employed ## Concat Leng ## 1 Bachelor- Employed 10 ## 2 Bachelor- Unemployed 7 ## 3 Bachelor- Employed 6 ## 4 Bachelor- Unemployed 10 ## 5 Bachelor- Employed 10 ## 6 Bachelor- Employed 6 ## count ## 1 0 ## 2 0 ## 3 0 ## 4 1 ## 5 1 ## 6 0

All the “1”s represent that they are MAle, “0” says not male.

5. Suppose want to replace all the occurrences of “W” in the customer’s id’s with “Y”,

cust$replace <- str_replace(cust$Customer, pattern = "W", replacement = "Y")

6. Suppose we want to sort the customers in ascending order,

cust1 <- str_sort(cust$Customer, decreasing = F) head(cust1) ## [1] "AI49188" "AO98601" "BQ94931" "BU27331" "BU79786" "BW63560"

7. We can see there are lots of unwanted extra spaces in EmploymentStatus column. To remove them,

cust$EmploymentStatus <- str_trim(cust$EmploymentStatus, "both")

Hence we can see it’s so easy to manipulate strings in R using **stringr** package

The post stringR package in R for Handling Strings appeared first on StepUp Analytics.

]]>The post Power BI Tutorial – An Interactive Visualisation Tool appeared first on StepUp Analytics.

]]>Power BI is a business analytics tool, provided by Microsoft, which give insights connecting to many data sources. It produces beautiful reports and publishes them for the organization to consume on the web and across mobile devices. It provides interactive visualizations with self-service business intelligence capabilities, where end users can create reports and dashboards by themselves, without having to depend on information technology staff or database administrators. Here in this Power BI tutorial series, we will demonstrate some real-time dashboard with an example dataset.

To install this tool, we have to go to the link given below,

https://powerbi.microsoft.com/en-us/downloads/

We can see “Microsoft Power BI Desktop” written in bold. Click the “download” option given below. The “PBIDesktop_x64.msi” file will get downloaded. Then continue with the installation steps as the dialog box pop up.

The four major building blocks of Power BI are Dashboard, Report, Workbook, and Dataset.

**Dataset**– A dataset is a collection of data that you import or connect to, and bring all of it together in one place. A single dataset can be a part of many workspaces.**Report**– A Power BI report is one or more pages of visualizations that brings out meaningful insight from a dataset. All of the visualizations in a report come from a single dataset.**Dashboard**– It is a single canvas that contains tiles and widgets pinned from a report, showing a single visualisation that was created from a dataset and pinned. From here we can see all the information needed to make decisions, in one glance. Also, we can monitor the most important information about in a business.**Workbook**– Workbooks are a special type of dataset. When we use Get data with Excel files and choose Connect, our workbook will appear in Power BI just like it would in Excel Online. Workbooks can’t be edited in power bi. These are the blocks we get under “My Workspace”.

Microsoft Power BI has features that can provide powerful insights through an interactive visualization at lower costs of ownership.

- Power BI Desktop provides an option of on-premises and cloud data sources, which may be structured or unstructured in nature. And these get added each month.
- Its Content Packs includes dashboard reports, data model and embedded queries (from SQL and R).
- Allows the user to write questions and accordingly, the software will provide with the content and answer, which can then be manipulated to match user’s needs.
- The feature of printing dashboards can be useful in any board meetings and discussions.
- It has more than 200 functions and counting. The rich DAX formulas help in creating powerful analytical data models.

When we open power bi desktop tool, let’s take a look at the interface and get to know about the pieces that make up the landing page for Power BI service

1. Report – It is the work space where we can create visualizations, as per we need. This is the first icon in the left pane.

2. Data – It shows full data set that is imported to power bi, the fields along with the records. This is the second icon in the left pane.

3. Relationship – It is useful when we import more than one related data sets in power bi. Clicking on this icon, we get to see relationships between the tables. In fact, we can even create relationships if needed. This is the third icon to the left pane.

4. Fields – To the extreme right we have this icon. This shows all the tables and their corresponding fields (or, attributes) of the data set imported. We drag and drop the fields on the visualizations from this place.

5. Visualizations – This is seen just beside Field panel. It contains all the visualization charts available to the users.

We can select any chart with a single click on it, and work on the data accordingly. It has a section of “Filter” below where we can drag the field to be filtered and the filtering criteria.

6. Page 1 – To the extreme bottom, this is the place where we can enter the name of the particular page. “Page 1” is the default name provided by the tool. Right click on it and click rename, to change the page name according to our choice.

- Navigation Pane – This is the left-most vertical window on the work space. It can be used to locate and move between our work spaces, dashboards, reports and even data sets.

- The “Get Data” option mentioned below, can be used to add data sets, reports and dashboards to power bi.
- By “Favorites” we can mark and later open our favorite content.
- By “Recent”, we can view and open our recently opened content.
- View, open, or delete an app by selecting “Apps”.
- By “Shared with me”, we can search and sort the content we want to find.
- Display and open your work spaces by selecting “Work spaces”.

2. Canvas – The canvas area displays visualization tiles, if we open a dashboard. Again, if we open the report editor, the canvas area would display a report page.

3. Q & A question box – This can be used to add content to a dashboard or report. We can ask a question and Power BI Q&A would give us an answer in the form of a visualization. It basically looks for an answer in the data set connected to the dashboard. A connected data set would mean that it has at least one tile pinned to that dashboard.

4. Icon buttons – The icons in the upper right corner are the resource for settings, notifications, downloads, getting help, and providing feedback to the Power BI team.

5. Dashboard title – It is the navigation path that Power BI creates for us, to figure out which workspace and dashboard are active.

6. Office 365 app launcher – From here we can quickly launch our emails, documents, calendar, and all our Office 365 apps.

7. Labelled icon buttons – This area contains additional options for interacting with the contents, i.e., options for duplicating, printing, refreshing the dashboard, etc.

**Video Links**

Below are mentioned few links of power bi tutorial videos, which would be useful for the beginners,

Clicking this link, you would find a list of 148 videos on power bi, from scratch. It’s an authentic site of Microsoft.

This link has 21 videos that would help you understand step by step knowledge on Power BI

Explaining the visualizations with the help of a small case study.

A sample data set is attached to the link below. Click on the link to view the full data set.

https://drive.google.com/file/d/10MdSVb46-8QMRwgyP3YVDDsFczjCGkXt/view?usp=sharing

It is the data set of salary structures for employees in few countries taken into consideration.

We will analyse the data set, its variables and try to bring out some insights, with the help of power bi tool.

- Install Power BI Desktop as instructed above.
- Download the data set from the link provided.
- Open Power bi
- From the home tab, click on the option “Get Data”. Select the type of file you want to load. (Here, the example data set is in xlsx. Hence, choose the option “Excel”)
- A pop-up box appears. Select the tables you want to load from the mentioned dataset and click on “Load”.
- The data gets loaded.
- Under “Visualization” tab on the right side we see all the types of visualizations available to us, to work on (like, bar chart, column chart, line chart, stacked area chart, pie chart, etc.). Select the visualization you need. Also, many “Custom Visualizations” are available to be downloaded freely and used.
- Drag and drop the fields you want to analyse, on relevant boxes thus provided.
- You can edit the visualization and make it more colourful and attractive, according to choice.
- After the visualizations are all ready, go to “Publish” option on the top right. Sign in with your official power bi account, and then click on “Publish on Web”. The dashboard is created and published on the web.
- You can generate a link to the dashboard and share with colleagues, friends, or in public.

The post Power BI Tutorial – An Interactive Visualisation Tool appeared first on StepUp Analytics.

]]>The post Understanding The Difference Between Blockchain and Bitcoin appeared first on StepUp Analytics.

]]>You must have read about the debate on blockchain vs bitcoin. Bitcoin has been in the news for years for both amazing and terrible reasons. Though many people don’t know how bitcoin works, everyone has their opinion on it. In this article, I will let you know the differences between the blockchain technology and bitcoin.

Blockchain and Bitcoin are not the same things. However, they are closely related. Getting straight to the point, blockchain is the underlying technology of bitcoin. Blockchain functions as a database for all the bitcoin transactions and maintains all the records right from the first transaction.

Blockchain was wrapped up together with bitcoin in the same solution when bitcoin was released as an open source code. And, since bitcoin was blockchain’s first application, people mistakenly used bitcoin to mean blockchain. This is how the misunderstanding began. Let us know in-detail about bitcoin and blockchain.

A blockchain is a distributed permissionless database based on blockchain protocol that runs a continuously expanding catalogue of transactional data records hardened against revision and tampering, even by the data store nodes’ operations. The blockchain technology’s well-known and initial application is the public ledger of transactions for bitcoin and altchains(the inspiration of similarly distributed ledgers). Every blockchain record is cryptographically enforced and hosted on machines functioning in the form of data store nodes.

Bitcoin is a payment system and a digital asset and was released as an open source software in 2009. The system is peer-to-peer, that is, the users are able to transact directly without the need for an intermediary such as a bank, a clearinghouse, or a credit card company. The network nodes verify the bitcoin transactions and these transactions are recorded in a public distributed ledger referred to as blockchain.

As we explore the technology behind blockchain, it is important to understand what role bitcoin plays. As explained earlier, bitcoin is a digital currency launched in 2009 with the intention of simplifying online transactions by bypassing government control of currency.

It does this by storing and transacting the currency over a peer-to-peer network, a blockchain rather than using a central monetary repository. It is important to make the distinction that bitcoin is not a blockchain itself. Bitcoin is transacted over an open public anonymous blockchain network. In many ways, you can think of blockchain as the operating system and bitcoin is one of the many applications that run on that system.

The blockchain that underlies bitcoin has some fundamental similarities but also key differences to a blockchain built for business such as the Linux Foundation’s Hyperledger Fabric. To gain insights into the difference between bitcoin and blockchain, let’s understand the fundamental similarities and differences between a bitcoin blockchain and a blockchain built for business.

Both are cost effective as they increase the speed of transactions and reduce overhead costs. Both are highly efficient as the transaction is recorded once and is then visible to all parties through the distributed network. Both are tampering evident. The transaction cannot be modified. It can only be reversed with another transaction. Both the transactions are visible in this case.

However, a bitcoin blockchain is limited in a few ways. It is primarily designed to transact cryptocurrency and is also open and public, meaning anyone can join and view every transaction that has ever happened on the network. It is anonymous, meaning it is nearly impossible to know the identity of who is involved in a transaction. Because of this, it requires heavyweight cryptography to deter fraudulent activity which requires significant computing power.

These characteristics lead to many issues around efficiency, confidentiality, security, and trust when conducting business, especially in regulated industries.

On the other hand, a blockchain built for business enables you to exchange anything of value whether tangible like a car or house, or intangible like a patent or copyright or digital like videos or photos. It is private, so the invited members know exactly who they are doing business with. It is permission, so the participants are only given access to data relevant to them and it runs on smart contracts, business logic embedded into the network reducing disputes and increasing trust.

The blockchain for business also utilizes selective endorsement which allows participants to control exactly who verifies transactions. Thus, from this section, you have obtained detailed information about the difference between a blockchain and a bitcoin.

Since there isn’t any official definition for blockchain, there are various network technologies that claim its name. From the above sections, it is evident that bitcoin refers to the currency token that is present on the blockchain network and there are various other networks that utilize a blockchain of some type or the other. Though the financial sector and technology companies consider the blockchain technology as a revolution, they still have to utilize bitcoin. The blockchain and bitcoin compliment each other as components of a greater system.

The post Understanding The Difference Between Blockchain and Bitcoin appeared first on StepUp Analytics.

]]>The post How can I apply for financial aid for multiple classes at Coursera? appeared first on StepUp Analytics.

]]>Fill these out as genuinely as you can, stating (not overtly so) your goals and achievements and motivations. Also, mention your financial status necessarily as that’s a major basis.

After you do that, sit back and relax. After a few days, go to the My Purchases section for your Coursera account, and go to **Transactions** thereon. If you see your course listed with “$0 USD”: congratulations!

It’s really easy, trust me. I applied for 3 courses and got a scholarship for each.

P.S. – There’ll be a deadline before which you’ve got to earn the Certificate or else you’ll have to pay for it, later than that. Step By Step Application form for Financial Aid:

**Step 1:**

One need to check all the three option and type the exact sentence in the text box below.

**Step 2:** Your ID should be verified in Coursera and the same name appears in the certificate of accomplishment.

**Step 3:** You need to fill in the below details, If you are a student mention annual income as zero and proceed with the application.

**Step 4:** This is what matters and becomes a deciding factor for your financial help approval or rejection. One needs to clearly elaborate the genuine need on why you need financial AID and how will it help you. Try and through some lights on your previous association and learning experience with Coursera.

If you ever participated in some events where this topics or concepts were used talk about that. You need to write the minimum of 150 words and try to be as genuine as possible.

**Step 5:** If you are a student mark this as NO and proceed with submission. In case you are into the job and can effort to pay some amount check on Yes and submit the application.

Good luck and happy learning!

The post How can I apply for financial aid for multiple classes at Coursera? appeared first on StepUp Analytics.

]]>The post Ethical Hacking using Linux-Information gathering using Dmitry appeared first on StepUp Analytics.

]]>Ethical Hacking is basically hacking done for a good purpose under authorization of proper authorities to bring down rogue or criminal hacking groups. Basically there are various tools for hacking like Nmap, Dmitry, Wireshark, Aircrack, Etc and are provided mostly in Linux based Operating systems,which are the ideal choices for both “good” and “bad” hackers as these tools are very powerful and the “Real” hacking as you know it is only done using Linux.

I shall post some pictures of these tools (nmap) being used in your favourite movies like the Matrix! Oh and we will gather information on Emails and subdomains of a particular website and a whois lookup(Looking up information on a website’s server OS, address, IP, etc)

A Linux distro from the Debian family like Ubuntu, Linux Mint, Debian itself or Kali Linux basically anything that can install Dmitry using its package manager.

To download Dmitry (Deepmagic Information Gathering Tool) enter this command

`sudo apt-get install dmitry`

and that’s it!

**Let’s do this!
**

So open your terminal and type in “Dmitry” and you should see the following output:

This means you have successfully installed Dmitry! so let’s start the magic! In this article i will only show you three basic commands and the rest will be shown in the future articles,note that every command starts with the name of the program and the parameters “-X” where x is any letter that indicates a certain function in the program followed by a space and the targets ip,Also note that we need the ip address of the website you want to lookup so we can find ip of any domain name by running a simple command in the terminal “ping <sitename.com>”

- The first command is dmitry -i <some ip address> . Here we replace the ip address with that of a website we want to lookup, let’s say (facebook.com, its IP is 31.13.92.36) and you will see various informati0n regarding country of origin, server status, IP addresses, subnet masks, etc
- The second command is dmitry -s <domain name> which is used to find sub domains of any domain/website(any website which has its own name followed by the main domain.com)
- The Third command is dmitry -e <domain name> which can return results on various email id’s associated with a domain (Dont try gmail.com because it can return a lot of results :P) you can try a server like epf.pl which i believe is a polish email server that my dad uses…but anyways once you do that you will see various email id’s associated with it and some email id’s of big companies that you can cross check using google search

Well, guys that was it for today and i will show you some more commands in upcoming tutorials, But! please do not use this tool for malicious activities and especially the tools that I will show you in the future. They can land you in jail if you use them for malicious activities and I am not responsible for how you use these tools!

The post Ethical Hacking using Linux-Information gathering using Dmitry appeared first on StepUp Analytics.

]]>The post Types of Cluster Analysis and Techniques using R appeared first on StepUp Analytics.

]]>- Definition
- Types
- Techniques to form cluster method

**Definition:**

- It groups the similar data in same group.
- The goal of this procedure is that the objects in a group are similar to one another and are different from the objects in other groups.
- Greater the similarity within a group and greater difference between the groups, more distinct the clustering.
- Cluster analysis provides a potential relationship and construct systematic structure in large number of varables and observations.

Main objectives of clustering are:

- Intra-cluster distance is minimized.
- Inter-cluster distance is maximized.

**Hierarchical clustering:**Also known as ‘nesting clustering’ as it also clusters to exist within bigger clusters to form a tree.**Partition clustering:**Its simply a division of the set of data objects into non-overlapping clusters such that each objects is in exactly one subset.**Exclusive Clustering:**They assign each value to a single cluster.**Overlapping Clustering:**It is used to reflect the fact that an object can simultaneously belong to more than one group.**Fuzzy clustering:**Every objects belongs to every cluster with a membership weight that goes between 0:if it absolutely doesn’t belong to cluster and 1:if it absolutely belongs to the cluster.**Complete clustering:**It perform a hierarchical clustering using a set of dissimilarities on ‘n’ objects that are being clustered. They tend to find compact clusters of an approaximately equal diameter.

**Techniques to form cluster method:**

- K-means
- Agglomerative hierarchical clustering
- DBSCAN.

Here in this article we will learn K-means clustering using R

**K-means:**

K Means Clustering is an **unsupervised learning algorithm** that tries to cluster data based on their similarity. **Unsupervised learning** means that there is no outcome to be predicted, and the algorithm just tries to find patterns in the data. In ‘k’ means clustering, we have the specify the number of clusters we want the data to be grouped into. The algorithm randomly assigns each observation to a cluster, and finds the centroid of each cluster. Then, the algorithm iterates through two steps:

- Reassign data points to the cluster whose centroid is closest.
- Calculate new centroid of each cluster.

These two steps are repeated till the within cluster variation cannot be reduced any further. The within cluster variation is calculated as the sum of the euclidean distance between the data points and their respective cluster centroids

The `iris`

dataset contains data about sepal length, sepal width, petal length, and petal width of flowers of different species. Let us see what it looks like:

library(datasets) head(iris)<em> Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa</em>

After a little bit of exploration, I found that `Petal.Length`

and `Petal.Width`

were similar among the same species but varied considerably between different species, as demonstrated below:

library(ggplot2) #this command will load the graphical package

ggplot(iris, aes(Petal.Length, Petal.Width, color = Species)) + geom_point()

Here

'iris': is the name of dataset

‘Petal.Length, Petal.Width’: are properties of species

‘color=species’: means different species will be in different color

geom_point(): this means output will be shown in dots.

here is graph

Here in this plot you can see the length and width of different species is almost same.

Okay, now that we have seen the data, let us try to cluster it. Since the initial cluster assignments are random, let us set the seed to ensure reproductibility.

set.seed(20) irisCluster <- kmeans(iris[, 3:4], 3, nstart = 20) irisCluster<em> K-means clustering with 3 clusters of sizes 50, 52, 48 Cluster means: Petal.Length Petal.Width 1 1.462000 0.246000 2 4.269231 1.342308 3 5.595833 2.037500 Clustering vector: [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [75] 2 2 2 3 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 2 3 3 3 3 [112] 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 [149] 3 3 Within cluster sum of squares by cluster: [1] 2.02200 13.05769 16.29167 (between_SS / total_SS = 94.3 %) Available components: [1] "cluster" "centers" "totss" "withinss" "tot.withinss" [6] "betweenss" "size" "iter" "ifault" > </em>

Since we know that there are 3 species involved, we ask the algorithm to group the data into 3 clusters, and since the starting assignments are random, we specify `nstart = 20`

. This means that R will try 20 different random starting assignments and then select the one with the lowest within cluster variation.

We can see the cluster centroids, the clusters that each data point was assigned to, and the within cluster variation.

Let us compare the clusters with the species.

table(irisCluster$cluster, iris$Species)<em> setosa versicolor virginica 1 50 0 0 2 0 48 4 3 0 2 46 </em>

As we can see, the data belonging to the `setosa`

species got grouped into cluster 1, `versicolor`

into cluster 2, and `virginica`

into cluster 3. The algorithm wrongly classified two data points belonging to `versicolor`

and six data points belonging to `virginica`

.

We can also plot the data to see the clusters:

irisCluster$cluster <- as.factor(iriscluster$cluster) ggplot(iris, aes(Petal.Length, Petal.Width, color = iriscluster$cluster)) + geom_point()

Here is the plot:

That brings us to the end of the article. I hope you enjoyed it! If you have any questions or feedback, feel free to leave a comment.

source: click here

The post Types of Cluster Analysis and Techniques using R appeared first on StepUp Analytics.

]]>