热门帖子

2012年3月24日星期六

Lecture 10:Security and Privacy Issues in Online Social Networks

Online social networks(OSN) security is becoming more and more important, because most social networking sites offer the basic features of online interaction, communication, and interest sharing; individuals create online profiles that others can view. And more and more business are based on social networks.Thus security and privacy are special challenges to social networks.

And there are three main security objectives of OSNs is privacy, integrity,availability.

The privacy of OSNs encompass  user profile privacy, communication privacy, message confidentiality,information disclosure. In some scenarios, privacy calls for default privacy of the information. But not all social networks do it well. Facebook & RenRen is better than QQZone on the feature of privacy now.Because if you have a QQ account and open a QQ Zone, all your information is published publicly in QQ Zone by default.

As a part of integrity, the user's identity and data must be protected against unauthorized modification and tampering.I think this will be a big challenge especially when many people do not log out after using the social networks. This give a chance for different attacks.

There are different angles to understand availability. In OSNs, the availability specially has to include robustness against censorship, and the seizure or hijacking of accounts etc, and has to ensure along with message exchange.

Though some security problems have to be processed by enterprises of social networks, there are some tips relatively easy for you to protect yourself and your business. Be discreet and skeptical on social networks. Never type anything into a profile page, bulletin board, instant message or other type of online electronic form that would expose you to unwanted visitors or the possibility of identify theft or malicious threats. Another important method is check privacy policies. And remember log out the website if you don't use it.

Reference:
http://www.focus.com/fyi/security-risks-social-networks/
http://www.crn.com/slide-shows/security/208401887/10-social-networking-security-trends-to-watch.htm?pgno=2

Lecture 9: Towards the Social Semantic Web

This lecture give us some information about the evolution of WWW. Meanwhile, concepts of social semantic web are introduced.  Semantic web can be seen as an important enhancement of web 2.0.

The semantic web is relatively new. It is a major research initiative of the world wide web consortium since 2004. According to the description of wikipedia, by encouraging the inclusion of semantic content in web pages, the semantic web aims at converting the current web of unstructured documents into a "web of data".It builds on the W3C's resource description framework[1].

One of the benefit of semantic web is that it describes data in a structured way. Then it is convenient for database operation. Furthermore, with some imagination, we can form an impression that the semantic web and social networks together can enable the World Wide Web to active its full potential. And we can imagine that by the combination of the two, data ( and people ) different networks can interact and result into new knowledge. That is a great power.

And I tried a related website: http://arnetminer.org
Arnetminer aims to provide comprehensive search and mining services for researcher social networks. The timeline of Arnetminer starts from 2006. There is a social graph of Mark Zuckerberg. The information of this Mark Zuckerberg is not complete, so I can not confirm whether he is the founder of Facebook. If this Mark Zuckerberg is the founder of Facebook, it is interesting of his registration and low impaction here.








Reference:
[1]a b c "W3C Semantic Web ActivityWorld Wide Web Consortium (W3C). November 7, 2011. Retrieved November 26, 2011.
[2]http://en.wikipedia.org/wiki/Semantic_Web

[3]http://arnetminer.org

Lecture 8: Social Network Analysis SNA in Psychology Research

In addition to elaboration of SNA principal, this lecture gives us some SNA examples and introduced some research on psychology. SNA can reveal the most prolific and the most influential people, as well as those who are isolated or those who assume roles as mediators between others. So it can be used for research cognitive style and so on.


Paper [1] referred a theory of Small Groups as Complex Systems (hereafter SGCAS[4]). SGCAS comes from a social psychological heritage.And it limits its scope to small groups, i.e. less than 20 members within it. Now with the help of SNA, a large scale network can be researched.


There is an example: Analysis of characteristics of undergraduate students based on SNA[6].
The analysis is based on "interpersonal attraction"theory. And all the interpersonal attraction comes from interpersonal relations.In the research ucinet software was used to analyze the interactions. And the boundary of the research is a virtual learning community composed by 33 members in a university.The community includes 5 administrators, 27 members and 1 teacher.
The graph above is the network structure in the virtual learning community.In the research, there are several observations. There are two isolated nodes, node 23 and node 31. This shows that they may be need more help to join the community. Another observation is interesting: it seems that females are more likely to become the key members. The explanation in the paper is that females would like to gather and exchange information due to the natural properties. But I think this may be seen in a controversy way. The SNA results give out an evidence of affinity of females.  




Reference:
[1]Alistar Sutcliffe,Analysing Social Computing Requirements with Small Group Theory.
[2]H.Beyer and K.Holtzblatt,Contextual Design:Defining Customer-Centered Systems. San Francisco:Morgan Kaufmann,1998.
[3]J.E.McGrath,"Time,task and technology in work groups: The JEMCO workshop study".Small Group Research: special issue,vol.224,1993,pp.283-421
[4]H.Arrow,I.E.McGrath and I.L.Berdahl,Small Groups as Complex Systems:Formation,Coordination,Development and Adaptation.Thousand Oaks CA:Sage,2000.
[5]http://books.google.com.hk/books?id=_UhbhVvGeQQC&printsec=frontcover&hl=zh-CN&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false
[6]Gao Lei, Analysis of characteristics of undergraduate students Based on SNA,IEEE

Lecture7:Social Network Analysis

Compared with lecture 6, this lecture has more explanation on prestige and ranking algorithms.


Pagerank patent belongs to Larry Page who is one of the founders of Google:






The page rank of our own websites calculated by a third part tool is as follow:
It is almost zero now. Because our website was built just several days ago, and still under construction. And we can see another example of Facebook:
We can see that the result is much higher. With this tool we can see the popularity of a websites to some extent.And of course Facebook is much more popular than our own website now. If we want to promote our website, there are still a lot of thing to do.

Page rank calculated by Google is not real time, but updated 4 times a year. It is said that this decision is a balance between technology and commercial reason.   

In fact, there are many different algorithms to calculate page rank. And there are different factors have been taken into consideration. Several factors affecting the prestige:
1.  Number of reverse connections and rank of reverse connections
2.  High quality content links connect to your website
3.  Add the search engine categories
4.  To join open source directory
5.  Appear in high traffic, high visibility and frequently updated website
6.  For google, PDF format document is emphasized and google toolbar installed have benefits
7.  Domain name and the title have keywords and meta tags
8.  Number of links exported

Reference:
http://www.prchecker.info/check_page_rank.php
http://baike.baidu.com/view/1518.htm

2012年3月14日星期三

Social Networking Case Study

To understand SNA quickly!



What is social network analysis(SNA)?
I think there are many different ways to understand what SNA is. I summarized what I have learned from lectures and websites as a brief introduction to help understanding the whole picture of SNA quickly.
Definition:
SNA is the study of the pattern of interaction between actors of social networks. It refers to methods used to analyze social networks, social structures made up of individuals (or organizations) called "nodes", which are tied (connected) by one or more specific types of interdependency, such as friendship, kinship, common interest, financial exchange, dislike, sexual relationships, or relationships of beliefs, knowledge or prestige[1]. 
"Social network analysis is the mapping and measuring of relationships and flows between people, groups, organisations, computers or other information/knowledge processing entities." (Valdis Krebs, 2002). Social Network Analysis (SNA) is a method for visualizing our people and connection power, leading us to identify how we can best interact to share knowledge.[10]


Significance:
We will have what others are having and create more by SNA.  This view is derived from the prospective of Sara Philpott of IBM[2].


Necessity:
Data corresponding social lives shared between individuals has grown at a phenomenal rate since the birth of social networking sites in1997[2]. Because of the underlying social structure information contained, people in various areas, believe SNA may be a good way to help them to know more about what are happening and what will happen. 


Tools related:
Socilyzer. 
It is built for manager and consultants to conduct their own basic analyses[4,5]
SocNetV.
It lets you construct networks(mathematical graphs) with a few clicks on a virtual canvas or load networks of various formats(GraphViz, GraphML, Adjacency, Pajek, UCINET, etc.) and modify them to suit your needs[6].
NodeXL.
It is a free, open-source template for Microsoft® Excel® 2007 and 2010 that makes it easy to explore network graphs[7].
Agna. 
It is a platform-independent application designed for social network analysis, sociometry and sequential analysis[8]. 
Wikipedia aslo provide a list of nearly 70 SNA tools[9].

Application:
SNA is an important tool for many areas, such as business intelligence,  advertisement strategy,  entrepreneurs,  improvement of performance of communication system, design of new mobile system, human resource management, social science, policy making etc.  

Challenges:
By studying from websites, I know that there are also several challenges for SNA, especially four of them.
Overlapping community analysis.
For convenient, many approaches simplify the model by assuming that the communities are distinct. But some times this model may be too simple, especially for the purpose of business intelligence.
Edge semantics.
In a lot of models, relations between two individuals are represented by a single edge with a single weight in the graph. This assumption is also too simple to help finding out the truth sometimes.
Modeling edge creation/maintenance cost
The cost of creating a link in an online social network is much more cheaper than the real world. Is this need to be considered? Yes, If hoping SNA becomes more useful.
Cross network analysis
Because of business reasons, it is not easy to do the cross network analysis. If we don not have a solution to do so, the results are one-sided to some extent.[3]

Instruction:
Other related information about SNA is described on my blog of Lecture 6: Social Network Analysis.
A case of SNA to find out the most influential node.
Consider the following social network formed by 5 students:


From the sociograph, we can see it as a non-directional graph. And we can see Alice, Bob, Carol, David and Eva as five nodes. The links between them represent the relationship.
We can represent the above network by a simple matrix as follow.
By using a matrix to transform the sociograph to a formal representation of relations makes it possible to compute measures by algorithms. This matrix can be called sociogram.


Now we need some terms for SNA measurements.This terms are useful for us to do quantitative analysis and do good to design softwares for automatic statistics.
Cutpoint: A node which, if deleted, will make the network disconnected.We can see that David is a cutpoint of the sociograph.
Bridge: A tie which, if deleted, will make the network disconnected.So link between David and Eva is a bridge.
Degree: The degree of a node is the number of links that are incident with it.
Density: The proportion of ties that exist out of all possible ties. In other words, the number of links divided by the number of vertices  in a complete graph with the same number of nodes.And the density of the above sociograph:  
                                                          2L/(g(g-1)) = 2*6/(5*4) = 0.6
Geodesic path: The shortest of all the paths between two nodes is called the geodesic path.
Geodesic distance: The distance of the geodesic path between two nodes is called the geodesic distance. If no path exist between two nodes, then the distance is infinite or undefined.The geo distance of the above sociograph is as follow:
Clique: Maximum set of nodes in which every node is connected to every other. E.g. {Alice, Bob, David} and {Alice, Carol, David} are cliques.
N-Clique: A set of nodes that are within distance n of each other. E.g. {Alice, Bob, Carol, David, Eva} is a 2-Clique.
K-Plex: A set of n nodes in which every node has a tie to at least n-k others in the set. E.g. {Alice, Bob, Carol, David} is a 2-Plex.
Centrality: Identify which nodes are in the 'center' of the network.In a social network, entities at the center can be very important. It is similar to the VIP of the real world to some extent. And there are three standard centrality measures widely used:Degree centrality, Closeness centrality, Betweenness centrality.
Degree centrality: The sum of all other actors who are directly connected to actor in concern. This term signifies activity or popularity, and can be normalized as:
                                                        
Group degree centralization: Look at the dispersion of centrality. A measure of the graph centralization:
CD(n*)is the largest value among all CD(ni) in the network.In this case, group degree centralization is 2/3.


Closeness centrality: Represents the mean of the geodesic distances between particular node and all other nodes connected with in.Can be understood as how long does it take for a message to spread inside the network from particular node.
                                                     
Normalized closeness centrality:
                                                     
Group Closeness Centralization :  Measures the overall level of closeness in a network. Measure how large the sum of differences can actually be. The numerator  can be calculate by:
Where Cc(n*)is the largest value among all CC(ni) in the network. The denominator is the theoretically maximum all CC(ni) in the network. In this case, group closeness centralization is 17/90.
Betweenness centrality:The number of times a node connects pairs of other nodes, who otherwise would not be able to reach one another. Betweenness centrality counts the number of shortest paths between j and k that actor i resides on.It is a measure of the potential for control as an actor who is high in 'betweenness' is able to act as a gatekeeper controlling the follow of resources(information, money, power, e.g) between the alters that he or she connects.And the measure is based on undirected graph.
                                                      
Normalized betweenness centrality:
                                                     
Group Betweenness Centralization :Measure the overall level of betweenness in a network.

CB(n*) is the largest value among all CB(ni) in the network.
Or simplified by
In this case, group betweenness centralization is 5/48.

Results:
It is easy to know that David is the most influential node. From degree centrality, we can see that the indicators of David is the largest. And from closeness centrality or betweenness centrality we can make the same judgement.These measurements gives us different angles to see the social network.


In fact the easiest way to find out the most influential node is common sense or intuitive. Briefly speaking, the number of links collected to David is the most. Using such a simple method, without any knowledge about SNA, one can know the conclusion. But if we want to describe it more precisely which is comfortable for computer to process, some concepts of SNA are useful.


Based on the results obtained, there are several findings:
1) Different methods used for a same social network may result in same result, but this is not enough to illustrate the  inevitability of consistency of the results. These methods demonstrate different angles of the network.But the model is simplified. Whether it is still the case in a much more complex model need more research.And the research can give us a better understanding of 'All roads led to Rome'. 


2)Different social networks have different features. This features can be researched by different   method in different dimensions. So the selection of tools may be important for the special cause.


3)The most influential node may be the cut point. So the node should be taken more attention in real world, because it may be a key resource or VIP person. Control these nodes may help control the entire network more effective and rapidly. And help to maintain the stability of the network.


4)SNA show a strong ability to find out the interaction patterns of social individuals. It provide a tool and a chance to do more complex research about the evolution of social and business. For example, SNA is now an impotent tool for business intelligence. This property promotes us to recognize that we will have what others are having and create more by SNA. 


Ref:
[1]http://en.wikipedia.org/wiki/Social_network_analysis
[2]http://www-935.ibm.com/services/ie/gbs/irishtelecom/pdf/social_network_analysis.pdf
[3]http://datamining.typepad.com/data_mining/2008/04/four-challenges.html
[4]http://socilyzer.com
[5]http://www.bioteams.com/2008/02/08/a_great_free.html
[6]http://socnetv.sourceforge.net/
[7]http://nodexl.codeplex.com/
[8]http://mac.softpedia.com/progDownload/AGNA-Download-47086.html
[9]http://en.wikipedia.org/wiki/Social_network_analysis_software
[10]http://www.kstoolkit.org/Social+Network+Analysis
[11]Lecture6,7,8

2012年3月1日星期四

Lecture 6: Social Network Analysis

Social network analysis views social relationships in terms of network theory consisting of nodes and ties (also called edgeslinks, or connections).For example, this figure shows a social network based on friendship ties based on Facebook data:



Several concepts about social network analysis was introduced in the lecture:Degree, density, Geodesic Distances,Closeness,Betweenness etc. These concepts are used to describe relationships between actors. For example, Betweenness Centrality , calculated by the number of times a node connects pairs of other nodes, who otherwise would not be able to reach one another.It is a measure of the potential for control as an actor who is high in betweenness is able to act as a gatekeeper controlling the flow of resources (information, money, power, e.g.) between the alters that he or she connects.And this figure is an example of a social network diagram. The node with the highest Betweenness Centrality is marked in yellow.



reference:
http://en.wikipedia.org/wiki/Social_network_analysis

Lecture 5: Social Multimedia Computing

Social multimedia hosting and sharing websites, such as Flickr, Facebook, Youtube, Picasa, ImageShack and Photo- bucket, are increasingly popular around the globe. Social multimedia is the hybrid of multimedia and social media[1]. 

Social multimedia computing is a cross-disciplinary research field. The theoretical underpinnings of social media computing include both computational and social sciences.[2]

A major trend in the current studies on social multimedia is using the social media sites as a source of huge amount of labeled data for solving large scale computer science problems in computer vision, data mining and multimedia[1].

By considering the geographic information (especially the GPS location) of photos, we are able to monitor the spread and adoption of a product around the world over time, which can help the company (or its competitors) exploit the growing popularity in different regions for better planning and management of manufacturing, marketing and distribution[1].



This is a case of the spread of iPod. The following figure shows the geo-tagged iPod images on Flickr distributed over the world during years from 2006 to 2009, using mapping package M_Map. Red points indicate new geo-tagged iPod images within the year compared with blue points indicating those appear before the year. The pattern shows how iPod spreads around the world over time. iPod was originally popular in North American and Europe, and then spread to other continents, such as Asian, Africa, South American and Australia[1].
 

 

By study the papers of social multimedia computing, I know more about the benefits for enterprise and individuals. And it provide a promising way to solve the problem of understanding the content of images. 
I am very interested in this area. I hope I can do something to let individuals have a better user experience.




reference:
 [1]http://www.cs.uiuc.edu/~hanj/pdf/mm10_xjin.pdf
 [2]http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5506093