Showing posts with label JAVA. Show all posts
Showing posts with label JAVA. Show all posts

Wednesday, April 20, 2022

Data Science vs. Business Analytics


Key Differences Between Data Science and Business Analysis:

Here are some of the key differences between data scientists and business analysts.

1. Data science is the science of studying data using statistics, algorithms and technologies, and business analysis is the statistical study of business data.

2. Data science is a relatively recent development in analytics, but business analytics has existed since the late 19th century.

3 Data science requires a lot of programming skills, but business analysis doesn't require a lot of programming.

4. Data science is an important subset of business analysis. Therefore, anyone with data science skills can do business analysis, but not vice versa.

5. Taking data science one step ahead of business analysis is a luxury. However, business analysis is needed for companies to understand how it works and gain insights.

6. Analytical Data Science results cannot be used for everyday business decision making, but business analysis is essential for critical administrative decision making.

7. Data science does not answer obvious questions. Questions are almost common. However, business analysis mainly answers very specific questions about finance and business.

8. Data science can answer questions that can be used for business analysis, but not the other way around.

9. Data science uses both structured and unstructured data, while business analytics primarily uses structured data.

10. Data science has the potential to make a big leap, especially with the advent of machine learning and artificial intelligence, while business analysis is still slow.

11. Unlike business analysts, data scientists don't come across a lot of dirty data.

12. In contrast to business analysis, data science relies heavily on data availability.

13. Investing in data science The cost of is high and business analysis is low.

14. Data science can keep up with today's data. Data is growing and diverging into many data types. Data scientists have the necessary skills to handle it. However, commercial analysts do not own it.


Data Science and Business Analytics Comparison Table

Below is the comparison table between Data Scientist and Business Analytics.

Comparison base

Data Science

Business Analytics

Coining of Term

In 2008, DJ Patil and Jeff Hammerbacher from LinkedIn and Facebook, respectively, invented the term Data Scientist.

Since Frederick Winslow Taylor's implementation in the late 1800s, business analytics has been in use.

Concept

Data inference, algorithm development, and data-driven systems are all interdisciplinary fields.

To derive insights from business data, statistical principles are used. 

Application-Top 5 Industries

·         Technology

·         Financial

·         Mix of fields

·         Internet-based

·         Academic

·         Financial

·         Technology

·         Mix of fields

·         CRM/Marketing

·         Retail

Coding

Coding is needed. Traditional analytics approaches are combined with a solid understanding of computer science in this subject.

There isn't a lot of coding involved. Statistically orientated.

Languages Recommendations

C/C++/C#, Haskell, Java, Julia, Matlab, Python, R, SAS, Scala, SQL

C/C++/C#, Java, Matlab, Python, R SAS, Scala, SQL

Statistics

Following the creation and coding of algorithms, statistics is used at the end of the analysis.

The entire investigation is based on statistical principles.

Work Challenges

·         • Business decision-makers do not employ data science results.

·         • Inability to adapt results to the decision-making process of the company.

·         • There is a lack of clarity about the questions that must be answered with the data set provided.

·         • Data is unavailable or difficult to obtain.

·         • IT needs to be consulted.

·         • There is a notable lack of domain expert involvement.

·         • Unavailability of/difficult access to data 

·         • Dirty data

·         • Concerns about privacy

·         • Insufficient finances to purchase meaningful data sets from outside sources.

·         • Inability to adapt results to the decision-making process of the company.

·         • There is a lack of clarity about the questions that must be answered with the data set provided.

·         • Tools have limitations.

·         • IT needs to be consulted.

Data Needed

Both structured and unstructured data.

Predominantly structured data.

Future Trends

Machine Learning and Artificial Intelligence

Cognitive Analytics, Tax Analytics

Friday, September 24, 2021

Big Data Computing: Quiz Assignment-III Solutions (Week-3)

1. In Spark, a is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.

A. Spark Streaming

B. FlatMap

C. Driver

D. Resilient Distributed Dataset (RDD)

Answer: D) Resilient Distributed Dataset (RDD)

Explanation: Resilient Distributed Data Sets (RDDs) are a basic Spark data structure. It is a distributed and immutable collection of objects. Each dataset in RDD is divided into logical partitions that can be computed on different nodes in the cluster. RDDs can contain any type of Python, Java, or Scala object, including custom classes. Formally, an RDD is a read-only, partitioned collection of data sets. RDDs can be created by deterministic operations on data in stable storage or other RDDs. RDD is a collection of fault tolerant elements that can be operated in parallel.


2. Given the following definition about the join transformation in Apache Spark:

def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]

Where join operation is used for joining two datasets. When it is called on datasets of type (K, V) and (K, W), it returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.

Output the result of joinrdd, when the following code is run.

val rdd1 = sc.parallelize(Seq(("m",55),("m",56),("e",57),("e",58),("s",59),("s",54)))

val rdd2 = sc.parallelize(Seq(("m",60),("m",65),("s",61),("s",62),("h",63),("h",64))) val joinrdd = rdd1.join(rdd2)

joinrdd.collect


A. Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)),

(m,(56,65)), (s,(59,61)), (s,(59,62)), (h,(63,64)), (s,(54,61)), (s,(54,62)))

B. Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)),

(m,(56,65)), (s,(59,61)), (s,(59,62)), (e,(57,58)), (s,(54,61)), (s,(54,62)))

C. Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)),

(m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))

D. None of the mentioned

Answer: C) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)),

(m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))

Explanation: join() is transformation which returns an RDD containing all pairs of elements with matching keys in this and other. Each pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is in this and (k, v2) is in other.

 

3. Consider the following statements in the context of Spark:

Statement 1: Spark improves efficiency through in-memory computing primitives and general computation graphs.

Statement 2: Spark improves usability through high-level APIs in Java, Scala, Python and also provides an interactive shell.

A. Only statement 1 is true

B. Only statement 2 is true

C. Both statements are true

D. Both statements are false

Answer: C) Both statements are true

Explanation: Apache Spark is a fast and universal cluster computing system. It offers high-level APIs in Java, Scala, and Python, as well as an optimized engine that supports general-execution graphics. It also supports a variety of higher-level tools, including Spark SQL for SQL and structured computing, MLlib for machine learning, GraphX ​​for graph processing, and Spark Streaming. Spark comes with several sample programs. Spark offers an interactive shell, a powerful tool for interactive data analysis. It is available in Scala or Python language. Spark improves efficiency through in-memory computing primitives. With in-memory computing, data is kept in random access memory (RAM) instead of some slow disk drives and is processed in parallel. This allows us to recognize a pattern and analyze large amounts of data. This has become popular because it reduces the cost of storage. Therefore, in-memory processing is economical for applications.


4. True or False ?

Resilient Distributed Datasets (RDDs) are fault-tolerant and immutable.

A. True

B. False

Answer: True

Explanation: Resilient Distributed Datasets (RDDs) are:

1. Immutable collections of objects spread across a cluster

2. Built through parallel transformations (map, filter, etc.)

3. Automatically rebuilt on failure

4. Controllable persistence (e.g. caching in RAM)


5. Which of the following is not a NoSQL database?

A. HBase

B. Cassandra

C. SQL Server

D. None of the mentioned

Answer: C) SQL Server

Explanation: NoSQL, which stands for "not just SQL", is an alternative to traditional relational databases where the data is stored in tables and the data schema is carefully designed before the database is created. NoSQL databases are particularly useful for working with large amounts of distributed data.

 

6. True or False ?

Apache Spark potentially run batch-processing programs up to 100 times faster than Hadoop MapReduce in memory, or 10 times faster on disk.

A. True

B. False

Answer: True

Explanation: Spark's biggest claim about speed is that "it can run programs up to 100 times faster than Hadoop MapReduce in memory or 10 times faster on disk." Spark could make this claim because it takes care of the processing in the main memory of the worker nodes and avoids unnecessary I / O operations on the disks. The other benefit that Spark offers is the ability to chain tasks at the application programming level without actually writing to disks or minimizing the amount of writes to disks.


7. _____________leverages Spark Core fast scheduling capability to perform streaming analytics.

A. MLlib

B. Spark Streaming

C. GraphX

D. RDDs

Answer: B) Spark Streaming

Explanation: Spark Streaming ingests data in mini-batches and performs RDD transformations on those mini-batches of data.


8. _________ is a distributed graph processing framework on top of Spark.

A. MLlib

B. Spark streaming

C. GraphX

D. All of the mentioned

Answer: C) GraphX

Explanation: GraphX is Apache Spark's API for graphs and graph-parallel computation. It is a distributed graph processing framework on top of Spark.


9. Point out the incorrect statement in the context of Cassandra:

A. It is a centralized key-value store

B. It is originally designed at Facebook

C. It is designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure

D. It uses a ring-based DHT (Distributed Hash Table) but without finger tables or routing

Answer: A) It is a centralized key-value store

Explanation: Cassandra is a distributed key-value store.


10. Consider the following statements:

Statement 1: Scale out means grow your cluster capacity by replacing with more powerful machines.

Statement 2: Scale up means incrementally grow your cluster capacity by adding more COTS machines (Components Off the Shelf).

A. Only statement 1 is true

B. Only statement 2 is true

C. Both statements are false

D. Both statements are true

Answer: C) Both statements are false

Explanation: The correct statements are:

Scale up = grow your cluster capacity by replacing with more powerful machines

Scale out = incrementally grow your cluster capacity by adding more COTS machines (Components Off the Shelf)

Saturday, May 13, 2017

COMPUTER APTITUDE TEST FOR BCA(H)/B.SC.(IT)

1. Which of the following are components of Central Processing Unit (CPU)?
A.                 Arithmetic logic unit, Mouse
B.                 Arithmetic logic unit, Control unit
C.                  Arithmetic logic unit, Integrated Circuits
D.                 Control Unit, Monitor

2. If a computer provides database services to other, then it will be known as?

A.                 Web server
B.                 Application server
C.                  Database server
D.                 FTP server

3. In which of the following form, data is stored in computer?

A.                 Decimal
B.                 Binary
C.                  HexaDecimal
D.                 Octal

4. Who was the father of Internet?

A.                 Charles Babbage
B.                 Vint Cerf
C.                  Denis Riche
D.                 Martin Cooper

5. What is the name of first super computer of India?

A.                 Saga 220
B.                 PARAM 8000
C.                  ENIAC
D.                 PARAM 6000

6. Which is most common language used in web designing?

A.                 C
B.                 C++
C.                  PHP
D.                 HTML

7. Which among following is odd?

A.                 CD/DVD
B.                 Floppy Disks
C.                  SD Disk
D.                 BIOS

8. Which program is run by BIOS to check hardware components are working? 

A.                 DMOS
B.                 POST
C.                  CMOS
D.                 RIP

 

9. MPG is a file extension of which type of files?

A.                 Audio
B.                 Image
C.                  Video
D.                 Flash
  

10. 1 Mega Byte is equal to

A.                 1024 Bytes
B.                 1024 Kilo Bytes
C.                  1024 Giga Bits
D.                 1024 Bits

11.  IP address version 4 is in which format?

A.                 4 bit
B.                 8 bit
C.                  16 bit
D.                 32 bit

12. Who invented C++?

A.                 Steve Jobs
B.                 James Gosling
C.                  Bjarne Stroustrup
D.                 Dennis Ritchie

13. One nibble is equal to how many bits?

A.                 4 bits
B.                 8 bits
C.                  12 bits
D.                 16 bits

 

14. Which term is related to database?

A.                 PHP
B.                 Java
C.                  Oracle
D.                 Assembly

15. Who invented Java?

A.                 Deniss Ritche
B.                 James Gosling
C.                  Bajarnae
D.                 Linus Torvalds

 

16. What is full form of HTTP?

A.                 Hyper Transfer Text Protocol
B.                 Hyper Text Transfer Protocol
C.                  Hexagonal Text Transfer Protocol
D.                 Hexagonal Text Transfer Prototype

 

17. What is the name of a device that converts digital signals to analog signals?

A.                 Router
B.                 Switch
C.                  Modem
D.                 None of above

 

18. C is what kind of language?

A.                 An assembly language
B.                 A third generation high level language
C.                  A machine language
D.                 Future language

19. Which of the following is a non-volatile memory?

A.                 RAM
B.                 LSI
C.                  VLSI
D.                 ROM

20.  Which is not an operating System?

A.                 Unix
B.                 Linux
C.                  Windows
D.                 Java

21. ++i is equivalent to

A.                 i = i + 2
B.                 i = i + 1
C.                  i = i + i
D.                 i = i - 1

 

22. Minimum number of stacks of size n required to implement a queue of size n

A.                 One
B.                 Two
C.                  Three
D.                 Four

23. Recursive problem is implemented by

A.                 queues
B.                 stacks
C.                  linked lists
D.                 strings

24. Which of the following name does not relate to stacks?

A.                 FIFO lists
B.                 LIFO list
C.                  Pile
D.                 Push-down lists


25.
Which of the following data structure is linear data structure?

A.                 Trees
B.                 Graphs
C.                  Array
D.                 None of above

 

26. Attribute of one table matching to the primary key of other table, is called as

A.                 foreign key
B.                 secondary key
C.                  candidate key
D.                 composite key

27. Ascending order of data hierarchy is

A.                 bit->byte->record->field->file->database
B.                 bit->byte->field->record->file->database
C.                  byte->bit->field->record->file->database
D.                 byte->bit->field->file->record->database

28. Data dictionary is a special file that contains

A.                 the names of all fields in all files
B.                 the data types of all fields of all files
C.                  Both of above
D.                 None of above

 

29. Which of following the problem of thrashing is significantly affected?

A.                 program size
B.                 program structure
C.                  primary storage
D.                 secondary storage

30. Which of following need a device driver

A.                 Cache
B.                 Disk
C.                  Main Memory
D.                 Registers

31. Which of following is not an advantage of multiprogramming?

A.                 increased throughput
B.                 shorter response time
C.                  ability to assign priorities of jobs
D.                 decreased system overload

32. Which of the following memory allocation scheme is subject to external fragmentation?

A.                 Segmentation
B.                 Swapping
C.                  Demand Paging
D.                 Multiple Contiguous Fixed Partition 

 

33. Spooling is most beneficial where

A.                 Jobs are I/O bound
B.                 Jobs are CPU bound
C.                  Jobs are evenly divided as I/O bound and CPU bound
D.                 All of above

34. In which of the following usually a front end processor is used?

A.                 Virtual storage
B.                 Timesharing
C.                  Multiprogramming
D.                 Multithreading

 

35. Banker's algorithm for resource allocation deals with?

A.                 deadlock prevention
B.                 deadlock avoidance
C.                  deadlock recovery
D.                 circular wait

 

36. Which scheduling policy is most suitable for time shared operating system?

A.                 Shortest job first
B.                 FCFS
C.                  LCFS
D.                 Round robin

37. Belady anomaly occurs in?

A.                 LIFO
B.                 FIFO
C.                  LRU
D.                 NRU

38. What is a page fault?

A.                 is an spelling error in a page in memory
B.                 reference to a page which is in another program
C.                  is an access to a page not currently in memory
D.                 always occurs whenever a page is accessed from memory

39. Break statement is used for

A.                 Quit a program
B.                 Quit the current iteration
C.                  Both of above
D.                 None of above

 

40. Continue statement used for

A.                 To continue to the next line of code
B.                 To stop the current iteration and begin the next iteration from the beginning
C.                  To handle run time error
D.                 None of above

41. What will be output of
void main()
{
char test =`S`;
printf("\n%c",test);
}

A.                 S
B.                 Error
C.                  Garbage value
D.                 None of above

 

42. What will be the output of following program
main(){
int x,y = 10;
x = y * NULL;
printf(\"%d\",x);
}

A.                 error
B.                 0
C.                  10
D.                 Garbage value

 

43. Difference between calloc() and malloc()

A.                 calloc() takes a single argument while malloc() needs two arguments
B.                 malloc() takes a single argument while calloc() needs two arguments
C.                  malloc() initializes the allocated memory to ZERO
D.                 calloc() initializes the allocated memory to NULL

 

44. total number of keywords in C are

A.                 30
B.                 32
C.                  48
D.                 132

45. Which operator in c++ can't be overloaded

A.                 %
B.                 +
C.                  ::
D.                 -

46. Which operator has the highest priority

A.                 ()
B.                 []
C.                  *
D.                 /

47. Difference between structure and union is

A.                 We can define functions within structures but not within a union
B.                 We can define functions within union but not within a structure
C.                  The way memory is allocated
D.                 There is no difference


48.  printf() belongs to which library of c

A.                 stdlib.h
B.                 stdio.h
C.                  stdout.h
D.                 stdoutput.h

 

49. All members of class have which DEFAULT access to its members

A.                 private
B.                 public
C.                  protected
D.                 depends

50. Constructor is

A.                 A class automatically called whenever a new object of this class is created
B.                 A class automatically called whenever a new object of this class is destroyed
C.                  A function automatically called whenever a new object of this class is created
D.                 A function automatically called whenever a new object of this class is destroyed

51. Which of the following cannot be inherited from the base class

A.                 Constructor
B.                 Friend
C.                  Both A and B cannot be inherited
D.                 Both A and B can be inherited

52. What is the value of sizeof(char)

A.                 1
B.                 2
C.                  4
D.                 8

53. Which arithmetic operation can be done in pointer

A.                 Multiplication
B.                 Division
C.                  Addition
D.                 None of above

54. What is inheritance

A.                 Inheritance allows one class to reuse the state and behavior of another class.
B.                 It deals with dangling pointers
C.                  It deals with void pointers
D.                 It is type of class declaration

55. What is abstract class

A.                 Whose objects can’t be created
B.                 Whose objects can be created
C.                  Depends on class
D.                 None of above

56. What is polymorphism

A.                 Ability to take more than one form
B.                 Ability to destroy destructor
C.                  Ability to create constructor
D.                 None of above

57. OSI reference model has how many layers

A.                 4
B.                 5
C.                  7
D.                 3

58. Framing is done on which layer

A.                 Datalink Layer
B.                 Physical Layer
C.                  Transport Layer
D.                 Application Layer

 

59. Which layer deals with Flow control

A.                 Session Layer
B.                 Network Layer
C.                  Transport Layer
D.                 Application Layer

60. MAC address is of how many bits

A.                 24 bit
B.                 32 bit
C.                  48 bit

D.                 128 bit

Search Aptipedia