Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Enhancing Clarity in Scientific Papers: Multivariate Volume Visualization - Prof. Prasun D, Study notes of Computer Science

University of North Carolina (UNC) - Chapel Hill Computer Science

Prof. Prasun Dewan

Advice on improving the flow and clarity of scientific papers, with a focus on multivariate volume visualization (mvv) papers. It discusses common issues such as lack of justification, intra-paragraph flow problems, and the importance of connecting techniques. Examples are given to illustrate these problems and their solutions.

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-zig 🇺🇸

10 documents

1 / 24

Partial preview of the text

Download Enhancing Clarity in Scientific Papers: Multivariate Volume Visualization - Prof. Prasun D and more Study notes Computer Science in PDF only on Docsity! Crafting a Technical Argument Prasun Dewan University of North Carolina Comp 911, Fall ’08 This is a quick survey of common writing problems in the survey papers submitted by previous students of this class. The advice given here should apply not only to survey papers but also theses, which, like survey papers, have relatively lax page limits and are expected to be written for a relatively general audience. Some of it may not be relevant to conference papers as they have stringent page limits, are written for an audience that is assumed to have expertise in the topic of the paper, and are expected to focus more on positioning new research with respect to previous work rather than teach the reader about the subject area. Nonetheless, most of it should apply to any technical paper or talk that tries to make an argument (rather than simply provide a series of facts). The writing problems and their solutions are illustrated using excerpts from the papers. The evaluation of selected excerpts from a paper is not meant to be a judgment of the whole paper. In fact, often a paper demonstrated both the problem and its solution. Portions of the excerpts on which the reader should focus are underlined. Create intra-para flow The biggest problem with most papers is the lack of flow, both between successive sentences in a paragraph and successive paragraphs in the same or different sections. The following example illustrates an intra-paragraph flow problem: Excerpt 1 Abstract— As hardware improves and scientific data sets grow, the task of creating perceptually intuitive images becomes correspondingly more difficult. Multivariate volume visualization (MVV) is a particular subset of the data visualization field that produces imagery for spatially-embedded data volumes with multiple scalar values at each volume coordinate. The fundamental difficulty with these massive data sets is that there is too much data to naively display at one time. Some form of data reduction is required. This paper presents a representative selection of modern MVV techniques that each attempt to solve this problem. The flow issues here are pretty subtle and perhaps not even noticeable. However, by addressing them, we can demonstrate what good flow looks like in the extreme case. The problem is that the third sentence in the paragraph builds on the idea expressed in the first sentence, while the second sentence expresses an independent theme. The first sentence tells us without justification that visualizing large data sets is difficult: As hardware improves and scientific data sets grow, the task of creating perceptually intuitive images becomes correspondingly more difficult. The third sentence provides the justification: The fundamental difficulty with these massive data sets is that there is too much data to naively display at one time. The second sentence, on the other hand, simply describes the scope of the paper: Multivariate volume visualization (MVV) is a particular subset of the data visualization field that produces imagery for spatially-embedded data volumes with multiple scalar values at each volume coordinate. The third sentence, thus, should follow the first one. Perhaps a simpler way to rationalize this order is to say that the phrase “these massive data sets” should be in a sentence that follows the one in which the data sets were introduced. Let us swap the position of the second and third two sentences: Abstract— As hardware improves and scientific data sets grow, the task of creating perceptually intuitive images becomes correspondingly more difficult. The fundamental difficulty with these massive data sets is that there is too much data to naively display at one time. Multivariate volume visualization (MVV) is a particular subset of the data visualization field that produces imagery for spatially-embedded data volumes with multiple scalar values at each volume coordinate. Some form of data reduction is required. This paper presents a representative selection of modern MVV techniques that each attempt to solve this problem. This modification actually aggravates the flow problem as now the fourth sentence is out of place. Let us move it so it follows the second sentence: Abstract— As hardware improves and scientific data sets grow, the task of creating perceptually intuitive images becomes correspondingly more difficult. The fundamental difficulty with these massive data sets is that there is too much data to naively display at one time. Some form of data reduction is required. Multivariate volume visualization (MVV) is a particular subset of the data visualization field that produces imagery for spatially-embedded data volumes with multiple scalar values at each volume coordinate. This paper presents a representative selection of modern MVV techniques that each attempt to solve this problem. This is better but still not perfect. The last sentence contains the phrase “this problem,” which refers to the need for data reduction. Thus, the sentence before it should be the one that refers to data reduction. So we can move it too: Abstract— As hardware improves and scientific data sets grow, the task of creating perceptually intuitive images becomes correspondingly more difficult. The fundamental difficulty with these massive data sets is that there is too much data to naively display at one time. Some form of data reduction is required. This paper presents a representative selection of modern MVV techniques that each attempt to solve this problem. Multivariate volume visualization (MVV) is a particular subset of the data visualization On the other hand, the first paragraph of section 1.2.1 is connected to the paragraph before it: Broadly speaking, segmentation algorithms can be classified into three categories: pixel- based, edge-based and region-based methods. 1.2.1 Pixel-based Algorithms Pixel-based methods directly work in the feature (histogram) domain, and spatial information isn’t taken into consideration. The reason is that a description of pixel-based techniques is exactly what the reader expects next. It seems that the writer could have used this technique of enumerating the next level of topics to address the lack of connection between the sections on thresholding and finite mixture models: 1.2.1 Pixel-based Algorithms Pixel-based methods directly work in the feature (histogram) domain, and spatial information isn’t taken into consideration. They can be classified as thresholding, finite mixture models, and K-Means. Thresholding Thresholding is the simplest approach among pixel-based methods. … These values may seem arbitrary, but the reasons for choosing them become much more apparent after inspection of the image’s histogram in figure 1.2 where clear intersections of the Gaussian distributions occur around intensities 20, 40, 75, and 140. Finite Mixture Models (FMM) Finite mixture models assume that the input image (or images) consists of a fixed number of distinct objects (or tissue types). The added underlined sentence tells the reader to expect a description of finite mixture models after thresholding. Do we now have to explicitly connect the end of the description of the former to the start of the description of the latter? It is easy to argue that the answer is yes. The description of finite mixture models is long and it is possible that the reader has forgotten that finite mixture models are the next item in the menu. More fundamentally, the reader has no idea why a model more complicated than thresholding is necessary and thus may not be motivated to learn more. The best flow occurs in this part of the excerpt: Pixel-based methods directly work in the feature (histogram) domain, and spatial information isn’t taken into consideration. Thresholding Thresholding is the simplest approach among pixel-based methods. The first sentence of the thresholding section gives the reason why thresholding is interesting: the fact that it is the simplest method. A similar motivation is needed for finite mixture models. One way to provide it is to, at the end of the description of thresholding, give some problem with it that is addressed by finite mixture models. In general, it is possible to connect a series of alternative techniques by using one or more drawbacks of a technique to motivate the subsequent technique. This approach is illustrated in the following excerpt: Excerpt 3 3.1 Direct Volume Rendering …. Direct volume rendering and all of its successors use color transfer functions. These functions describe the mapping between data values and RGBA color values and are most commonly implemented as a simple lookup table. … 3.2 Isosurfaces Even using modern volume texture compositing techniques, direct volume rendering is a computationally expensive process for large data sets. The isosurface is a volume visualization technique that is geometry-based, which plays directly into the primary task of graphics hardware: rendering triangles. Rather than specifying a range of values and their corresponding color and opacity values, isosurfaces instead require the user to specify only a particular data value of interest, known as an isovalue. … The section on issosurfaces starts with a problem with direct volume rendering, the subject of the previous section, that is addressed by isosurfaces. Applying this problem- solution-problem connection approach is not easy as it is rare for techniques to be placed linearly on some goodness scale. Typically they make different tradeoffs. To apply this approach, we may need to change the metric used to connect two techniques presented in succession. For example, after the providing the description of issosurfaces, we can look at the metric of user-effort to introduce a technique that requires less effort than issosurfaces. Two techniques can be connected based not only on how well they work along some evaluation dimension but the difference between the algorithms they use. In general, we tend to understand alternative algorithms in relative rather than absolute terms. This algorithm-diff approach is used on Excerpt 3. We learn in the description of direct volume rendering that it uses color transfer functions: Direct volume rendering and all of its successors use color transfer functions. Then, in the description isosurfaces, we learn that it replaces these functions with isovalues: Rather than specifying a range of values and their corresponding color and opacity values, isosurfaces instead require the user to specify only a particular data value of interest, known as an isovalue The algorithm-diff connection approach is also illustrated in Excerpt 1 as techniques are classified based on whether they look at pixels or edges. Providing both evaluation and design-based connections is an important part of writing a good integrative survey. Repeat only to remind When a new technique is motivated by comparing it with the previously presented technique, the comparison can be presented at the end of the section describing the previous technique or the start of the section discussing the new technique. However, it should not be repeated, as in the underlined portions of the excerpt below: Excerpt 4 In the next section, I will describe another voltage scaling technique [2] for self-timed systems which removes some of the limitations of [1]. Instead of using additional FIFO buffers as in [1], their technique uses handshake signals to detect the operating speed of the circuit and to decide whether voltage should be reduced or increased. 4 Adaptive Supply Voltage Scheme This section describes a voltage scaling technique [2] for self-timed systems which uses handshake signals directly to detect circuit speed relative to the input data rate and uses this information to guide the voltage scaling. This technique uses a system in which data processing circuits is asynchronous, but data is being input from a synchronous environment. Imagine giving a talk in which you gave the connection between two successive slides both at the end of the previous slide and the start of the next slide - it would be so odd and unnecessary. It is true that people can read papers in random rather than linear order. Nonetheless you should assume they are being read in linear order. Here is another example of repetition: Excerpt 5 When these high energy state nuclei relax back to a lower energy state they emit radiation which can be detected by the NMR spectrometer. This flip-flopping between parallel and Excerpt 7 Here a series of image composition techniques are compared based on how much geometrical information they use to place the composed images in the composition. The amount of geometry used is not connected to the goodness of a technique. For example, in a collage, using geometry serves no purpose. The geometric spectrum is simply a way for us to understand the relationships in the designs of these techniques. As mentioned earlier, normally there are several evaluation dimensions. Moreover, typically there are several design dimensions. Don’t be content with using only one of these dimensions to classify techniques. Give all important dimensions you can identify. For example, as we saw earlier, the direct volume rendering and isosurfaces techniques can be compared not only based on how confusing they are but also how computationally expensive they are and how much user effort they require. By covering all the dimensions you can identify, you will help the readers better understand the techniques. Explain “subtle” concepts explicitly The dimensionalized space can be given at the start of or end of the comparison. In either case, it is important to explicitly justify the placement of a technique along each dimension. It is possible that the description of a technique implies its placement in a spectrum. Even if it does, it is a good idea to explicitly justify the placement as the readers probably have to re-scan the descriptions of the techniques to understand their placement. If the classification is presented before the discussion of the techniques, then the discussion of each technique can justify its placement relative to the techniques presented before. If it is done at the end, then, of course, the justification must also come at the end. Neither of the two excerpts above labels the points in the evaluation/design dimension, simply giving the relative order of the techniques. To be more precise, if the points can be described by small names, go through the effort of labeling the points in a dimension, to both help the reader better understand the spectrum and provide an extra level of abstraction in the discussion. The more general advice is to explain “subtle” concepts explicitly. To determine what is subtle, imagine that your audience is your best friend in your discipline not in your area of research rather than your advisor or some other expert in your area. For example, if you are explaining some aspect of computer vision, imagine your audience is your best friend in computer science who knows nothing about computer vision. Authors typically under explain, even when targeting readers familiar with the domain of the paper, afraid of insulting the reader. Reviewers typically complain about under rather than over explanation. It is fairly likely that when you explicitly explain some point, you yourself will gain a better understanding of it or realize that the point is not valid. For example, by justifying the placement of points in an evaluation/design dimension and labeling points in it, you yourself will gain a better understanding of the relative differences among a series of techniques. There is a famous story of a professor being asked by a student to explain some relatively straightforward aspect of Einstein’s theory. After providing the explanation, he exclaimed that he finally understood the theory! In general, it is very hard to know precisely how much to explain. That is why it is a good idea to practice your explanation on friends and colleagues before you commit yourself to a final talk or a paper, and take seriously reviewer comments about under-explanation. Provide explicit definitions instead of buzzwords Explaining subtle concepts explicitly implies that you should explicitly define all terms you need to make your point. When you don’t define a term, it becomes a buzzword to the readers/audience, who may wonder if there really is a concept behind it. Let us take a concrete excerpt: Excerpt 8 Figure 1. Simple User Personae None of the terms used in the figure are defined in the paper from which the example was taken. Some of them such as email, Unix and windows do not need explanation as readers can be expected to be familiar with them, and more important, they are not really relevant to the point made by the paper. On the other hand, the paper is about digital persona and accounts, and the shows that they are different concepts. The examples chosen in the figure here do illustrate the difference. However, based on this information, it is not easy for the reader come up with a definition of digital account and persona that is consistent with their use in the figure. Thus, the author really needs to take the effort of providing the definitions; otherwise they are simply vaguely understood buzzwords. Even a mathematical term might be vaguely defined. Consider the following: Excerpt 9 A list of peaks in a spectrum are identified, and listed as a (mass, abundance) tuple. When asked to explicitly define a peak, the author said that it was in fact a rather ill- defined concept. Explicitly acknowledging this might actually make the discussion more rather than less confusing. Consider next the following excerpt: Excerpt 10 A stopping term from the Mumford-Shah segmentation technique is adopted in this model and the entire curve evolution is conducted under the level set framework Neither Mumford-Shah segmentation nor level sets are defined in the paper, but these appear to be well defined terms. It seems that it is enough to fix the problem by simply referencing papers/books that do define them. Even if the paper is being written for an expert audience, the paper would be more insightful if it explained the aspects of the level sets and Mumford Shah algorithm that are relevant to the conclusions of the paper. Ideally, a paper should be self contained and not require the readers to lookup references to understand its point. Put in another way, a related work section in a paper should not simply refer to previous work. It should typically also explain the insights in these works, putting them in some dimensionalized space. Often it is the quality of writing in this section that determines if the paper is accepted. In conferences such as SIGGRAPH with small page limits, it is hard to adhere to this principle. But, in most ACM and IEEE conferences there is enough space. Often the explanation of a concept is only a sentence or two, in which case space is not an issue. As implied above, it is important to explain only those aspects of a concept that are relevant to the conclusions of the paper. Here is an example that defines the terms it uses: Excerpt 11 Excerpt 14 It is very difficult to understand the role and nature of the two definitions on their own. Fortunately, the paper later provides an example that addresses this issue. On the other hand, the reader may skim through the definitions if there is not enough illustration or motivation when they are presented. Therefore, rather than separating definitions and examples, it is preferable to mix them. Ideally, one or more running examples should be developed as motivation before the solution is presented and then each aspect of the solution such as a definition should be incrementally illustrated using the examples. The more symbols and superscripts the solution uses, the more important it is to incrementally illustrate it. In good talks this approach is usual – typically an abstract concept is along with some concrete illustration of it. This is also a good approach for papers, unless a series of definitions are needed before any aspect of the example can be given. Provide figure descriptions in-place in the paper narrative Often the example is a figure, as in the excerpt above. The incremental illustration approach does not work if the example is described only in the figure description. Providing detailed figure descriptions rather than one-line figure captions is a good idea as the busy reader can understand the work by simply looking at the figures. But, as mentioned before, you should focus more on those who are reading the paper in a linear fashion. Thus, be sure to explain each aspect of a figure at exactly the place in the argument where you want the readers to pay attention to it. You can, of course, duplicate the description in a long caption for the skimming reader, provided you have space. The description of a figure should explicitly point out all interesting aspects of the figure. This is done in good talks, which don’t just present a figure and wait for the audience members to digest it. A good paper should follow the same rule and not rely simply on a paper reader having more time than a talk listener to deduce the point. This advice is a special case of the principle of being explicit. Here is an excerpt showing this rule being broken: Excerpt 15 The paper gives us a terse explanation of the figure, never telling us explicitly the reason for the shown schedule, why Task T4 receives 0.75 (t2 – t1) allocation units, or why this allocation amount is interesting. Use formal descriptions only if needed Formal descriptions are particularly hard to follow. Therefore use them only if necessary. Consider: Excerpt 16 The excerpt tells us that a task is represented as a directed acyclic graph and then mentions that the nodes of the graph have deadlines and dependencies among them. However, it does not tell us explicitly that a (a) task is structured into subtasks, (b) each node represents a subtask, and (c) each edge connecting two nodes represents a dependency between the tasks corresponding to the nodes. As a result this “formal explanation” is not really complete – which is one of the goals of such an explanation. Worse, once this underlined informal description of the task is provided, the formal definition in terms of the various symbols is not necessary to make the point of the paper! This advice of using formal descriptions only when necessary applies also to algorithms. Consider: Excerpt 17 The paper does not explain or refer to any of the steps in the algorithm, which again, do not contribute to the explicit conclusions of the paper. In summary, if you present formalism such as an equation, model, or algorithm, do something with it such as deriving it or illustrating it with an example. There is no creative effort in simply presenting a formalism, especially if developed by someone else! This advice is a special case of the more general rule of not giving anything irrelevant to the focus of the paper. Mention nothing irrelevant to the subject of the paper The following excerpt shows how this rule can be broken even when formalism is not an issue. Consider: Excerpt 18 …. In this paper, I survey the progress made on dynamic voltage scaling for asynchronous systems. 1.3 Self-Timed Systems The self-timed circuits are a sub-class of asynchronous systems which are popularly used for voltage scaling. These systems use dual-rail code for data path and a four-phase handshake protocol for communication control. The dual-rail protocol encodes the request signal … Here, there is no connection between the last paragraph of section 1.2 and the first paragraph of section 1.3. This is not just a flow problem. Dual rail protocols are We would like to thank Prof. Svetlana Lazebnik for the discussion on some parts of this paper. We would like to thank Prof. Prasun Dewan for valuable suggestions and ideas … Fortunately, the title of the section implicitly says that you are thanking the persons identified. So you can simply give the contribution: 7. ACKNOWLEDGMENTS Prof. Svetlana Lazebnik discussed parts of this paper. Prof. Prasun Dewan provided valuable suggestions … However, sometimes you may indeed want to refer to yourself with a first person pronoun as the following excerpt from this paper shows: Excerpt 23 I was once in a faculty-recruiting talk in which most of the terms used by the speaker were unfamiliar to me. It is possible to refer to yourself as the “first author” and then use third person pronouns to refer to yourself: The author was once in a faculty-recruiting talk in which most of the terms used by the speaker were unfamiliar to him. However, the rewrite sounds a bit awkward, at least, to me. Both choices are, of course, acceptable. Rewrite to avoid facing the “he/she” issue It is also possible to rewrite to avoid addressing the he/she issue by replacing the single subject of a sentence with a collection of people. Consider the following text referring to a single reader. In this excerpt, it is not really necessary to remind the reader that the radiation is emitted and detected by the NMI spectrometer, as it was only in the previous paragraph that he/she learnt this fact. The second part of the sentence uses a singular third person pronoun only because the first part refers to a single reader. If we instead refer to the collection of readers, we can use “they” as the third person pronoun without being grammatically incorrect: In this excerpt, it is not really necessary to remind the readers that the radiation is emitted and detected by the NMI spectrometer, as it was only in the previous paragraph that they learnt this fact. Of course, it is not always possible to replace a single person with a group without changing the meaning of the paper. Avoid use of “notice” Many papers preface some point with the phrase “notice” or “note that.” This command is redundant – should every point you make not be noteworthy? Here is an illustrating excerpt: Excerpt 24 Notice that the match posterior probabilities, P(ajbk|a, b, give a probability that aj aligned to bk using the pairwise probabilistic model specified by data a, b and parameters  The gap posterior probabilities, P(aj|a, b,P(bk| a, b, give the probability that aj or bk align to a gap,  using the pairwise probabilistic model specified by data a, b and parameters  The gap factor Gf is assigned a value greater or equal to zero [11]. Note that Gf equal to zero gives the expected developer score, fD, and Gf equal to 0.5 to gives the expected AMA score [11]. Note that fD is a sensitivity measure that is equivalent to the sum-of-pair score and is the most widely used measure of multiple sequence accuracy [11]. Using the global alignment score f() , the following two weight functions, WGfmaxstep & WGftgf , were developed to weight all pairs of columns when selecting the next pair of columns to merge [11]. In practice the WGfmaxstep was used to maximize sensitivity by setting Gf equal to 0 and WGftgf was used to adjust for better specificity and seems to optimize at Gf equal to 4 when both methods were compared to current sequence alignment methods [11]. The sentence makes sense, and in fact, more readable, without the preface! Avoid redundant referencing Often you need to refer to many aspects of some referenced work. This is illustrated in the excerpt above in which reference [11] is mentioned many times. In such cases, it is sufficient to give the reference a single time, when introducing the work. Avoid use of “clearly” Papers often tend to use the preface clearly before making a point: Excerpt 25 The loss of spectral resolution is clearly illustrated in Figure 4. Figure 4: Binning of a 1H NMR frequency spectrum, where a) shows the NMR spectrum without bins, b) shows the same spectrum with bins of width 0.01 ppm, c) shows bins of width 0.05 ppm and d) shows bins of width 0.10 ppm. Like “notice,” the preface, “clearly,” serves no useful purpose. If the point is clear to the authors but not the readers, it simply makes them feel inadequate. Often the point is not that clear, and the authors want to believe it is clear so that they can avoid writing text to justify it. This is probably the case in the excerpt above, as the author gave a more explanation of the figure in the talk. It is not uncommon for a point called “clear” to not even be true! Do not mix tenses When surveying a series of techniques, it is possible to refer to each of them in the past or present tense. However, do not mix tenses, as in the excerpt below: Excerpt 26 The first suggested solution, where a pairwise alignment is augmented as to produce the most appropriate target spectrum, has the advantage of being able to be implemented using any pairwise alignment scheme. …. The second solution of multiple sequence annealing provided a solution that could be adjusted through the gap factor to balance the relation of sensitivity verses specificity of a final solution.

Documents

questions

Enhancing Clarity in Scientific Papers: Multivariate Volume Visualization - Prof. Prasun D, Study notes of Computer Science

Related documents

Partial preview of the text