Showing posts with label Big Data. Show all posts
Showing posts with label Big Data. Show all posts

Tuesday, December 3, 2013

Too sexy for my job?

I found this article on being a Data Scientist: The sexiest Job no one has

It could be a case of 'the grass is greener' syndrome but this is the job I want and I think Im on the way to getting it :) ... to me, its an exciting prospect that 'the universe (is) one large data set' ... *big data* indeed ;)


Now here come the 'damned lies and statistics'... Gartner estimates that there will be 4.4 million IT jobs created to support data analysis in just the next two years... and about half of those will be outside the US... I wonder what the stats say for data scientist jobs being generated in Australia?

I wonder what qualifications and experience I will *really*need, maybe its all that geeky stuff I keep hearing about...

- watch The Big Bang Theory religiously ... maths, science, history, unravelling the mystery ... (tick)
- can recite the script of The Matrix, 1, 2 and 3 (tick)
- think that leela and amy are hot (tick)
- yada, yada, yada (tick)

... oh, oh, what's this....

- PhD is advanced statistics

... darn, I was really close... well there's something to work on!

I just hope this 'wave' is really and not all sexy hype... cue LMFAO ;)

Friday, September 27, 2013

The value of 'a topic on a page'

Some time ago, I become interested in summarising a topic on a page.  The interest sprang from some really good posters I found that summarised all the important stuff ragarding a topic e.g. Rational had a really good one on UML and another on RUP.

Around the same time, I found a template that Apple were using for creating 'posters' to summarise a topic on a page and the notion of 'poster sessions' where people would put their poster on a wall and people would meet in a room with multiple posters and read them to get a quick update on topics of interest.

Most recently, I have rekindled my interest in 'topic on a page' and find them a great way of communicating a lot of concepts quickly.  The digital equivalent seems to be a really good infographic or a really good Prezi presentation.

The only trouble is, a good poster is hard to produce.

I'm currently working on a 'Big Data on a page' poster and will post it back here when I have something to share.

In the meantime, any tips on how to create good posters, infographics and the like are most welcome :)

Monday, September 2, 2013

I want to be a Data Scientist

I want to be a Data Scientist

I saw a really interesting YouTube video today on called [The Data Scientists Toolset] [1]
[1]: http://www.youtube.com/watch?v=FEdfhTLFXaA "YouTube video"
It is a video of a panel discussion from a conference called Data Scientist Summit (note to self: I have really missed the boat if the others are already having summit’s… back in 2012! ;) )
I admit, I did not know anyone on the panel but after listening to them talk I believe they must all be experts of some note.
Some of the key points for me were:

  • The 3 big things you need as a Data Scientist
  • Value from Big Data = having Big Analytics
  • Run experiments ‘at scale’
  • Room for Everyone - Hadoop, NoSQL and “new SQL”, and
  • The ‘Desert Island challenge’

I’ll cover each in a bit more detail below.

The 3 big things you need as a Data Scientist

According to the experts on the panel, there are 3 big things that Data Scientists need to have:

  1. Domain skills and expertise,
  2. Great modelling (read statistics) skills, and
  3. Tech literacy with the Big data (and other) tools and technologies required.

To me, this is a great list of reasons for good collaboration between the Business and IT. Business professionals ideally have good Domain skills and experience. Visa versa, IT professionals typically have technology literacy.

The 'middle ground' Great modelling (statistics) skills is the interesting one. Some people have this based on whatever they did in Uni and continued into their professional career.

Its more likely that Business professionals are going to have the right skills and experience, especially in business domains such as Economics, Finance, Science, Research, etc.

However, of the 3 required areas of skills/ experience, this is the most likely to 'fall between the cracks' i.e. no-one has them.

I think this is (perhaps) the reason that higher education qualifications being offered by Universities around the world are so 'heavy' in statistics and maths.

Value from Big Data = having Big Analytics

I think this was a great point!

I see a lot of excitement (almost hysteria) about Big Data and how **cool** it is to be able to parse the Petabytes of log files and other Big Data out there but … where is the value?

Big Data is often associated with the 3 'qualifying' V's - Volume, Variety and Velocity.

I think it is a good idea to add 2 more 'quantifying' V's to the list - Veracity and Value.

Veracity

Veracity to me examines the question of the 'validity' of the data source in terms of what people want to do with it.

One of the lessons I have learnt from just 'normal' Data warehouse and Business Intelligence/Analytics solutions is just because there is data out there it does not mean you should try and capture it and make use of it. You really need to ask yourself the question: is this data appropriate to my needs? or, in a qualitative sense, how appropriate is the data? (does it do part of the job?)

Value

The 'flip side' is, Is there value in using this data? Does it help me tell the right story?
Big Data to me has a huge risk to be addressed - the GIGO (Garbage In, Garbage Out) principle means that people risk *Big Garbage*.

Run experiments ‘at scale’

Gone are the days of having to have small amounts of data to test your 'models' and to validate that they produce the 'right' results before trying them out on the 'real data' (usually Production or a copy of Production).

The panel stressed that Big Data tools and technologies allow you to operate 'at scale'.

Personally, I'm not sure about this one. I may not be a gun at statistics but I seem to remember that it does not take much data to provide a statistically valid model e.g. to predict the outcome of an election all you really need is a relatively small, but representative, sample from the population to have confidence in the results predicted… assuming that the rest of the population follow certain rules.

I think there's a difference between validating your models and running on full scale data.
Just because Big Data has 'resources to burn' I don't think people should lose sight of good modelling and testing.

Room for everyone

I think its 'reassuring' that Big Data is seen as a complementary technology and is best applied to suitable 'problems' (or classes of problem).

The panel made it clear they see a role for all of the data technologies: Big Data (e.g. Hadoop), NOSQL, and 'new SQL'.

One criterial they suggested for deciding which data technology was a best fit was whether the model of the data was 'to be discovered', partially agreed, or agreed (respectively).

Big Data technologies are typically associated with 'a model at use time' versus 'new SQL' where the modelling takes place first and then the data is poured in.

The ‘Desert Island challenge’

When it was time to wrap up, the moderator for the expert panel session posed a question: If you were (to be) stranded on a desert island, what tool or technology would you take with you … and only one!
Interestingly, **all** of the panel members named a programming language technology: Java, C++, Python, etc.

I guess this speaks to the 'roots' of the panelists and the fact that the Big Data tools and technology, while all useful in their own right, are not quite there yet to be able to dislodge the versatility and power provided by a programming language.

I hope this 'commentary' was of interest. I would encourage you to view the YouTube video for yourself. I am sure you will get different stuff out of it than I did.

More on Big Data to come in future Blogs.

Friday, July 19, 2013

This is Big! (Data) :)

Today in my inbox, I found an email with a link to this article on 'waiting for Big Data'

As a consultant for an Australian consulting and technology firm, the article reflects a lot of my own thoughts and the key point for me is the one about the 'maturity of the market' at the moment with the industry being likened to 'a collection of tools that are not really integrated and are very technical in nature'.

A friend asked me the other day what 'my company' was doing with regard to Big Data and my response was that we were currently partnering with technology partners, such as Microsoft, to assist clients do a Proof of Concept (PoC) and also, where appropriate, helping them move these through to 'production'

but , me personally, its still just an area of great interest and something I am educating myself on with the view to moving into this area over the next few years e.g. Big Data strategy, architecture, road-mapping.... it's a WIP but I am working on it :)

Some of the questions I am trying to answer for myself at the moment include...

  • is Big Data just a BI/DW thing?
  • is it a new capability or just incremental?
  • are existing IM tools and techniques useful? e.g. data modelling or does Big Data require new approaches? (as people seem to indicate)
  • are there patterns of requirements? and corresponding patterns of solution types?
  • what would a 'domain architecture' comprise? and how could it be leveraged for strategy, architecture and road-mapping purposes?
I'm sure others will arise but these are the main ones in my head at the moment.

I'd love to hear anyone's ideas on this topic... the lines are open, call now! ;)