The workshop on Advanced Python was conducted on Dec 7, 2015 and Dec 9, 2015 in two phases by Nikhil Mishra and Ashwin Ramanathan. The first day of the workshop covered topics such as how Python evolved to become a mainstream language for data analytics and text analytics in particular. The workshop started with basics like downloading and setting up Python and then progressed towards providing examples in social media, natural language processing and other sectors which gave students some context on how Python could be used and of its growing significance in industry. Sentiment analysis using Twitter API’s was demonstrated during the workshop which resonated well with students.
The second day of the workshop covered concepts such as regular expressions, pandas and how to use Python to frame rules for text mining using Simple Grammar. Other concepts like Machine learning were also touched upon during the workshop. The two-day workshop was a great learning experience to students beginning to learn Python and also for those with previous experience.
The Workshop on Apache Spark was conducted on Wednesday, December 2nd, by Vinay Vaddiparthi, as a one and half hour session. The agenda for the workshop was to cover an overview, brief history and applications of Apache Spark. Vinay was also to cover MapReduce algorithm, RDDs, Transformations & Actions and a use case for Apache Spark.
Apache Spark is an open source cluster computing framework, originally developed in the AMPLab at University of California, Berkley. It is a fast, general purpose computational engine for large-scale data processing. It is written is Scala, a functional programming language. There are many advantages to using Apache Spark over Hadoop’s MapReduce paradigm. It is ten to hundred times faster than MapReduce on memory, it runs on powerful Scala, Python, Java and R APIs, and it brings the processing to data rather than the data to processing.
Vinay then gave an overview of the Spark stack and explained the contrast between the use of MapReduce and Apache Spark. He then explained the concept of Resilient Distributed Datasets, their creation and use, and use of transformations and actions. Vinay demonstrated a use case in python on Apache Spark and its installation. At the end of the workshop he provided links to MOOCs with learning and practice material for Apache Spark.
Corporate Technologies info-session was held on December 1, 2015 and the experts spoke about the company, culture and full time opportunity at Corporate Technologies. The company currently have opportunities for Analysts and Consultants. It was an insightful session to learn about the technologies that they work with and some of which are Tableau, Qlikview and Informatica. Their client base is in finance and pharma industries majorly. Students came to know how can they rise up the ladder from analyst to consultant working with Corporate Technologies.