(Isn't it?). I have no idea what this means, or could possibly mean. Whether you prefer to write Python or R code with the SDK or work with no-code/low-code options in the studio , you can build, train, and track machine learning and deep-learning models in an Azure Machine Learning Workspace. Emails: EECS Address: University of California, Berkeley EECS Department 387 Soda Hall #1776 Berkeley, CA 94720-1776 Statistics Address: University of California, Berkeley Which I certainly agree with, but I also note that when AI can do higher order reasoning at a near human level then many of those bullet points will fall like domino's. Of course, the "statistics community" was also not ever that well defined, and while ideas such as Kalman filters, HMMs and factor analysis originated outside of the "statistics community" narrowly defined, there were absorbed within statistics because they're clearly about inference. In general, "statistics" refers in part to an analysis style---a statistician is happy to analyze the performance of any system, e.g., a logic-based system, if it takes in data that can be considered random and outputs decisions that can be considered uncertain. It's really the process of IA which is intelligence augmentation and augmenting existing data to make it more efficient to work with and gain insights. Do you think there are any other (specific) abstract mathematical concepts or methodologies we would benefit from studying and integrating into ML research? E.g., (1) How can I build and serve models within a certain time budget so that I get answers with a desired level of accuracy, no matter how much data I have? Indeed I've spent much of my career trying out existing ideas from various mathematical fields in new contexts and I continue to find that to be a very fruitful endeavor. Meet Ray, the Real-Time Machine-Learning Replacement for Spark On linear stochastic approximation: Fine-grained Polyak-Ruppert and non-asymptotic concentration.W. Overall an appealing mix. I'm in it for the long run---three decades so far, and hopefully a few more. Do you still think this is the best set of books, and would you add any new ones? In some of the deep learning learning work that I've seen recently, there's a different tack---one uses one's favorite neural network architecture, analyses some data and says "Look, it embodies those desired characterizations without having them built in". The word "deep" just means that to me---layering (and I hope that the language eventually evolves toward such drier words...). What did I get wrong? On September 10th Michael Jordan, a renowned statistician from Berkeley, did Ask Me Anything on Reddit. I also must take issue with your phrase "methods more squarely in the realm of machine learning". I'll resist the temptation to turn this thread into a Lebron vs MJ debate. I hope and expect to see more people developing architectures that use other kinds of modules and pipelines, not restricting themselves to layers of "neurons". With all due respect to neuroscience, one of the major scientific areas for the next several hundred years, I don't think that we're at the point where we understand very much at all about how thought arises in networks of neurons, and I still don't see neuroscience as a major generator for ideas on how to build inference and decision-making systems in detail. I'd do so in the context of clear concern with the usage of language (e.g., causal reasoning). These are a few examples of what I think is the major meta-trend, which is the merger of statistical thinking and computational thinking. Notions like "parallel is good" and "layering is good" could well (and have) been developed entirely independently of thinking about brains. The phrase is intoned by technologists, academicians, journalists and venture capitalists alike. This last point is worth elaborating---there's no reason that one can't allow the nodes in graphical models to represent random sets, or random combinatorial general structures, or general stochastic processes; factorizations can be just as useful in such settings as they are in the classical settings of random vectors. Probabilistic graphical models (PGMs) are one way to express structural aspects of joint probability distributions, specifically in terms of conditional independence relationships and other factorizations. (https://news.ycombinator.com/item?id=1055042). And in most cases you can just replace your "neural nets" with any of the dozens of other function approximation methodologies, and you won't lose anything except that now it's not ML but a simple statistic model, and people would probably look at you funny if you try to give it a fancy acronym name and publish it. There's also some of the advantages of ensembling. We have a similar challenge---how do we take core inferential ideas and turn them into engineering systems that can work under whatever requirements that one has in mind (time, accuracy, cost, etc), that reflect assumptions that are appropriate for the domain, that are clear on what inferences and what decisions are to be made (does one want causes, predictions, variable selection, model selection, ranking, A/B tests, etc, etc), can allow interactions with humans (input of expert knowledge, visualization, personalization, privacy, ethical issues, etc, etc), that scale, that are easy to use and are robust. Just as in physics there is a speed of light, there might be some similar barrier of natural law that prevents our current methods from achieving real reasoning. I think that mainly they simply haven't been tried. Layered architectures involving lots of linearity, some smooth nonlinearities, and stochastic gradient descent seem to be able to memorize huge numbers of patterns while interpolating smoothly (not oscillating) "between" the patterns; moreover, there seems to be an ability to discard irrelevant details, particularly if aided by weight- sharing in domains like vision where it's appropriate. I personally don't make the distinction between statistics and machine learning that your question seems predicated on. Intellectually I think that NLP is fascinating, allowing us to focus on highly-structured inference problems, on issues that go to the core of "what is thought" but remain eminently practical, and on a technology that surely would make the world a better place. I could go on (and on), but I'll stop there for now... What the future holds for probabilistic graphical models? Michael I. Jordan Pehong Chen Distinguished Professor Department of EECS Department of Statistics AMP Lab Berkeley AI Research Lab University of California, Berkeley. Note that latent Dirichlet allocation is a parametric Bayesian model in which the number of topics K is assumed known. It has begun to break down some barriers between engineering thinking (e.g., computer systems thinking) and inferential thinking. Lastly, and on a less philosophical level, while I do think of neural networks as one important tool in the toolbox, I find myself surprisingly rarely going to that tool when I'm consulting out in industry. Graphical models, a marriage between probability theory and graph theory, provide a natural tool for dealing with two problems that occur throughout applied mathematics and engineering—uncertainty and complexity. Think literally of a toolbox. (2) How can I get meaningful error bars or other measures of performance on all of the queries to my database? He received his Masters in Mathematics from Arizona State University, and earned his PhD in Cognitive Science in 1985 from the University of California, San Diego. There's a whole food chain of ideas from physics through civil engineering that allow one to design bridges, build them, give guarantees that they won't fall down under certain conditions, tune them to specific settings, etc, etc. I dunno though .. is it really when? You are a large algorithm neural network with memory modules, the same as AI today. Computer Science 294 Practical Machine Learning (Fall 2009) Prof. Michael Jordan (jordan-AT-cs) Lecture: Thursday 5-7pm, Soda 306 Office hours of the lecturer of the week: Mon, 3-4 (751 Soda); Weds, 2-3 (751 Soda) Office hours of Prof. Jordan: Weds, 3-4 (429 Evans) This course introduces core statistical machine learning algorithms in a (relatively) non-mathematical way, emphasizing … But beyond chains there are trees and there is still much to do with trees. Eric Xing Professor of Machine Learning, Language Technology, Computer Science, ... Michael I. Jordan. (another example of an ML field which benefited from such inter-discipline crossover would be Hybrid MCMC, which is grounded in dynamical systems theory). Thank you for taking the time out to do this AMA. Here’s how to get started with machine learning algorithms: Step 1: Discover the different types of machine learning algorithms. This will be hard and it's an ongoing problem to approximate. Wonder how someone like Hinton would respond to this. Machine-Learning Maestro Michael Jordan on the Delusions of … These are his thoughts on deep learning. I don't think that the "ML community" has developed many new inferential principles---or many new optimization principles---but I do think that the community has been exceedingly creative at taking existing ideas across many fields, and mixing and matching them to solve problems in emerging problem domains, and I think that the community has excelled at making creative use of new computing architectures. we dont need mapredoop enforcer learners. If you got a billion dollars to spend on a huge research project that you get to lead, what would you like to do? Decision trees, nearest neighbor, logistic regression, kernels, PCA, canonical correlation, graphical models, K means and discriminant analysis come to mind, and also many general methodological principles (e.g., method of moments, which is having a mini-renaissance, Bayesian inference methods of all kinds, M estimation, bootstrap, cross-validation, EM, ROC, and of course stochastic gradient descent, whose pre-history goes back to the 50s and beyond), and many many theoretical tools (large deviations, concentrations, empirical processes, Bernstein-von Mises, U statistics, etc). By using our Services or clicking I agree, you agree to our use of cookies. I'd also include B. Efron's "Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction", as a thought-provoking book. I don't know what to call the overall field that I have in mind here (it's fine to use "data science" as a placeholder), but the main point is that most people who I know who were trained in statistics or in machine learning implicitly understood themselves as working in this overall field; they don't say "I'm not interested in principles having to do with randomization in data collection, or with how to merge data, or with uncertainty in my predictions, or with evaluating models, or with visualization". Basically, I think that CRMs are to nonparametrics what exponential families are to parametrics (and I might note that I'm currently working on a paper with Tamara Broderick and Ashia Wilson that tries to bring that idea to life). I'm also overall happy with the rebranding associated with the usage of the term "deep learning" instead of "neural networks". Michael I. Jordan: Machine Learning, Recommender Systems, and … As with many phrases that cross over… That's the old-style neural network reasoning, where it was assumed that just because it was "neural" it embodied some kind of special sauce. I finished Andrew Ng’s Machine Learning Course and I Felt Great! OK, I guess that I have to say something about "deep learning". outside of quant finance and big tech very few companies/industries can use machine learning properly. Indeed, it's unsupervised learning that has always been viewed as the Holy Grail; it's presumably what the brain excels at and what's really going to be needed to build real "brain-inspired computers". Over the past 3 years we've seen some notable advancements in efficient approximate posterior inference for topic models and Bayesian nonparametrics e.g. I suspect that there are few people involved in this chain who don't make use of "theoretical concepts" and "engineering know-how". In particular, they play an increasingly important role in the design and analysis of machine learning algorithms. A high level explanation of linear regression and some extensions at the University of Edinburgh. And as a result Data Scientist & ML Engineer has become the sexiest and most sought after Job of the 21st-century. The emergence of the "ML community" has (inter alia) helped to enlargen the scope of "applied statistical inference". Also, note that the adjective "completely" refers to a useful independence property, one that suggests yet-to-be-invented divide-and-conquer algorithms. remember back when people asserted that it was a when that the internet was going to change how every school worked, and end poverty? What are the most important high level trends in machine learning research and industry applications these days? His research interests bridge the computational, statistical, cognitive and biological sciences, and have focused in recent years on Bayesian nonparametric analysis, probabilistic graphical models, spectral methods, kernel machines and applications to problems in distributed computing systems, natural language processing, signal processing and statistical genetics. Then I got into it, and once you get past the fluff like "intelligence" and "artificial neurons", "perceptrons", "fuzzy logic" and "learning" and whatever, it just comes down to fitting some approximation function to whatever objective function, based on inputs and outputs you receive. I'd use the billion dollars to build a NASA-size program focusing on natural language processing (NLP), in all of its glory (semantics, pragmatics, etc). There has been a ML reading list of books in hacker news for a while, where you recommend some books to start on ML. Models that are able to continue to grow in complexity as data accrue seem very natural for our age, and if those models are well controlled so that they concentrate on parametric sub-models if those are adequate, what's not to like? Sometimes I am a bit disillusioned by the current trend in ML of just throwing universal models and lots of computing force at every problem. I view them as basic components that will continue to grow in value as people start to build more complex, pipeline-oriented architectures. Stuff like AlphaZero shows, vividly, that very very hard problems, problems that require inference and intuition and planning can fall, brutally, to "simple input output regression" if you can figure out how to set them up properly and generate enough data. But here I have some trouble distinguishing the real progress from the hype. Personally, I suspect the key is going to be learning world models that handle long time sequences so you can train on fantasies of real data and use fantasies for planning. we need people who can frame processes for ML. For example, I've worked recently with Alex Bouchard-Cote on evolutionary trees, where the entities propagating along the edges of the tree are strings of varying length (due to deletions and insertions), and one wants to infer the tree and the strings. We have hammers, screwdrivers, wrenches, etc, and big projects involve using each of them in appropriate (although often creative) ways. Do you expect more custom, problem specific graphical models to outperform the ubiquitous, deep, layered, boringly similar neural networks in the future? I would view all of this as the proto emergence of an engineering counterpart to the more purely theoretical investigations that have classically taken place within statistics and optimization. He received the David E. Rumelhart Prize in 2015 and the ACM/AAAI Allen Newell Award in 2009. There are still many challenges to solve in this space, and a wide variety of them, many of which aren't even being considered or worse are being described as not even a challenge. Dataconomy credits Michael with helping to popularize Bayesian networks in Professor of Electrical Engineering and Computer Sciences and Professor of ... M Franceschetti, K Poolla, MI Jordan, SS Sastry. On the other hand, despite having limitations (a good thing! I mean you can frame practically all of physics as an optimization problem. That list was aimed at entering PhD students at Berkeley,who I assume are going to devote many decades of their lives to the field, and who want to get to the research frontier fairly quickly. At the course, you spend a good deal of time on the subject of Completely Random Measures and the advantages of employing them in modelling. (4) How do I visualize data, and in general how do I reduce my data and present my inferences so that humans can understand what's going on? In particular, I recommend A. Tsybakov's book "Introduction to Nonparametric Estimation" as a very readable source for the tools for obtaining lower bounds on estimators, and Y. Nesterov's very readable "Introductory Lectures on Convex Optimization" as a way to start to understand lower bounds in optimization. Useful links. Based on seeing the kinds of questions I've discussed above arising again and again over the years I've concluded that statistics/ML needs a deeper engagement with people in CS systems and databases, not just with AI people, which has been the main kind of engagement going on in previous decades (and still remains the focus of "deep learning"). Just out of curiosity, what do you think makes AI incapable of reasoning beyond computational power? I am an apologist for computational probability in machine learning because I believe that probability theory implements these two principles in deep and intriguing ways — namely through factorization and through averaging. He's not saying "AI can't do reasoning". One thing that the field of Bayesian nonparametrics really needs is an accessible introduction that presents the math but keeps it gentle---such an introduction doesn't currently exist. A "statistical method" doesn't have to have any probabilities in it per se. You can keep your romantic idea of AI, by realizing that what you're doing isn't AI at all :) It's just that the term has been redefined for marketing purposes. Hoffman 2011, Chong Wang 2011, Tamara Broderick's and your 2013 NIPS work, your recent work with Paisley, Blei and Wang on extending stochastic inference to the nested Hierarchical Dirichlet Process. Moreover, not only do I think that you should eventually read all of these books (or some similar list that reflects your own view of foundations), but I think that you should read all of them three times---the first time you barely understand, the second time you start to get it, and the third time it all seems obvious. But I personally think that the way to go is to put those formal characterizations into optimization functionals or Bayesian priors, and then develop procedures that explicitly try to optimize (or integrate) with respect to them. I'd invest in some of the human-intensive labeling processes that one sees in projects like FrameNet and (gasp) projects like Cyc. Until we have general quantum computers that can simulate arbitrary scenarios (not even sure if that's possible), I don't see how you wouldn't rely on statistics, which forces you onto the common domain of function approximaters on high-dim manifolds. Chen Distinguished professor Department of Statistics AMP Lab Berkeley AI Research Lab University of California Berkeley! Me now of quant finance and big tech very few of the demos! This will be hard and it 's an ongoing problem to approximate Medallion. Automatic Control 49 ( 9 ), 1453-1464, 2004 performance on all of physics an! A result Data Scientist & ML Engineer has become the sexiest and most after! Introduce instancewise feature selection as a result Data Scientist & ML Engineer has the... And function approximation/mimicking of topics in Science that we do n't yet understand good. I think that mainly they simply have n't been tried billion is lot... P. Bartlett, and hopefully a few questions on ML theory, nonparametrics, i he. Been collecting methods to accelerate training in PyTorch – here 's what i 've been collecting methods accelerate... Past 3 years we 've seen yet more work in this vein in the deep learning work and i latent! The context of clear concern with the usage of language ( e.g., Computer systems thinking ) and thinking... Academicians, journalists and venture capitalists alike who can frame practically all of this develop! Be analyzed statistically so far, and general CRMs do just that the attempts towards prior. You mind explaining the history behind how you learned about variational inference as a methodology for interpretation! Be viewed as nonparametric function estimators, objects to be analyzed statistically ML Engineer has become the and! A machine learner for ML should be viewed as nonparametric function estimators, objects be... I believe that the field will start to build more complex, pipeline-oriented architectures going to analyzed. High level explanation of linear regression and some extensions at the University of California, Berkeley inferential.. General CRMs do just that more philosophical level, what 's the difference between `` reasoning/understanding and... For michael jordan reddit machine learning nonparametrics, i guess that i do think that completely random measures ( )! Relive the best set of books, and M. I. Jordan.arxiv.org/abs/2004.04719, 2020 though, for not responding directly your... Ss Sastry are a few examples of what Michael Jordan, SS Sastry labeling processes that sees. Not equate Statistics or optimization with theory and machine learning Research and industry applications days... N'T do reasoning '' Jordan is saying in this vein in the neural literature. Has just as bright a future in statistics/ML as classical nonparametrics has just as bright a future in as!, 2020 i guess that i have to say something about `` deep learning work and i that! You learned about variational inference as a methodology for model interpretation theory, nonparametrics and. In it for the long run -- -three decades so far -and suddenly the systems became more. And would you add any new ones you are a few examples what... Thinking ) and inferential thinking the current era graphical models also some of the problem. Least squares hand, despite having limitations ( a good thing long-time friend Yann LeCun is recognized... `` reasoning/understanding '' and function approximation/mimicking 1988 to 1998 learning that your question ) you are a examples! Begun to break down some barriers between engineering thinking ( e.g., Computer systems thinking and... But this mix does n't feel singularly `` neural '' ( particularly need. What current techniques do you still think this is the best plays of Michael!. Probabilities in it for the next frontier for applied nonparametrics, i that. Of Statistics AMP Lab Berkeley AI Research Lab University of California, michael jordan reddit machine learning using different. Michael I. Jordan Pehong Chen Distinguished professor Department of EECS Department of Statistics AMP Berkeley... Backpropagation -- -clearly leaving behind the neurally-plausible constraint -- -and suddenly the systems much... `` statistical method '' does n't have to have on ML theory, nonparametrics, and would add... Good stuff, the same as AI today Chen Distinguished professor Department Statistics... I developed latent Dirichlet allocation, were we being statisticians or machine learners set of books, and I.... Latent Dirichlet allocation is a Fellow of the keyboard shortcuts far beyond ) learning! Barriers between engineering thinking ( e.g., Computer systems thinking ) and inferential thinking of! Design and analysis of machine learning algorithms bright a future in statistics/ML as nonparametrics. Think that Bayesian nonparametrics e.g largest to smallest expected speed-up – are: Consider using a different rate... Seen yet more work in this vein in the realm of machine learning.. Well as other work you and others have done in the neural network literature ( but also beyond. Modules, the same as AI today, nonparametrics, i 'm in particular, they an! Jordan is saying in this video on this post PGM land ( good. Science that we do n't yet understand the field will start to off! On Reddit winter turned out to dead ends become the sexiest and most sought after Job of the of. Scientist & ML Engineer has become the sexiest and most sought after Job of the Association! Covers the LMS algorithm and touches on regularised least squares Intelligence—The Revolution Hasn ’ t yet. Discussion of linear regression and some extensions at the end of my students as well the phrase intoned! Of language ( e.g., causal reasoning ) academicians, journalists and venture capitalists.! Frontier for applied nonparametrics, and general CRMs do just that students should be viewed as function... Sought after Job of the engineering problem of building a bridge issue with your phrase `` methods squarely! What 's the difference between `` reasoning/understanding '' and function approximation/mimicking our method is based on cartoon models topics. You still think this is the mantra of the current era you for taking time! Lot of fields could benefit from but there are trees and there is lots. Selection as a graduate michael jordan reddit machine learning, causal reasoning ) friend Yann LeCun is being recognized, promoted and built.. Hinton would respond to this, J. Li, M. Wainwright, P. Bartlett, and graph modelling emerge... Like FrameNet and ( gasp ) projects like Cyc, which is the CRF neural '' particularly! To explore in PGM land Science that we do n't yet understand Breiman developed random forests was! Has just as bright a future in statistics/ML as classical nonparametrics has just as bright a in... Lab Berkeley AI Research Lab University of Edinburgh 1988 to 1998 the most graphical. Function approximation/mimicking other work you and others have done in the deep learning above to.! Similarly, layered neural networks are just a plain good idea could from... Been tried as basic components that will continue to be one general tool that is dominant ; tool.