Featured post

Top 5 books to refer for a VHDL beginner

VHDL (VHSIC-HDL, Very High-Speed Integrated Circuit Hardware Description Language) is a hardware description language used in electronic des...

Friday, 16 August 2013

Verilog and SV Event Scheduler

A simulation timeslot is divided into ordered regions to provide a predictable interaction between design constructs. Verilog event scheduler has four regions for each simulation time as Fig 1.

verilog_even_scheduler Fig 1: Active region is for executing process statements; Inactive region is for executing process statements postponed with a “#0″ procedural delay; NBA region is for updating non-blocking assignments; Monitor region is for executing $monitor and $strobe and for calling user routines registered gor execution during this read-only region.

SystemVerilog adds regions to provide a predictable interaction between assertions, design code and testbench code.

sv_event_schedularFig 2: Preponed region is fora smapling signal values before anything in the time slice changes their values; additional observed region is for assertion evaluation. Re-Active and Re-Inactive regions is for executing assertion action blocks and testbenchh programs; Postponed region is for system tasks that record signal values at the end of the time slice.

SV introduces new verification blocks:

— Program
To have clear sepration between testbench and design, SV introdueces program block, which contains full environment for testbench. It is intended to reduce user-induced races. It executes in the Re-Active region.

— Final
“Final” block is used to print summary information in log file at the end of simulation. It executes at the end of the simulation (after explicit or implicit call to $finish) without delays.
e.g.

program asic_with_ankit;
  int error, warning;
  initial begin
  //Main program activities…..
  end
  final begin
  $display (“Test is done with %d errors and %d warnings”, error, warning);
  end
endprogram

— clocking blocks
A clocking block identifies clock signals and captures the timing and synchronization requirements of the blocks being modeled. It supports following features
– Input sampling
– Synchronous events
– Synchronous drives
e.g.

clocking cb @(posedge clk);
  default input #1step //default timing skew for inputs/outputs
          output #3;
  input dout;
  output reset, data;
  output negedge enable;
endclocking

 

clocking_skew Fig 3 clocking skew example


Inputs are sampled at clock edge and outputs are driven at clock edge. Input skew designates sample time before clock edge and output skew designates driving time after the clocking event.

Get free daily email updates!

Follow us!

Thursday, 8 August 2013

NVIDIA Sets Up New Tech Center Near Detroit

audi-r8-light-bluewallpaper-of-car-a-light-blue-audi-r8-gt-free-wallpaper-world-uboyxbtr The new Nvidia Technology Center in Ann Arbor, Mich., will focus on working the IC design company's chips into automotive electronics. While Nvidia made a name for itself with its graphics technologies, the last few years have seen a shift at the company as its watches its Tegra system-on-chip (SoC) division grow."Our new facility will help our growing team of Michigan-based engineers and executives work with automakers and suppliers to develop next-generation infotainment, navigation and driver assistance programs," Nvidia's Danny Shapiro said.

Even with such successes, the automotive industry is still a small part of Nvidia's overall business - but the new technology centre looks to shift that balance. Located in Michigan, a short distance from infamous car-centric Detroit, the centre will concentrate on building technologies specifically for the automotive industry.

Get free daily email updates!

Follow us!

Sunday, 4 August 2013

Panasonic with ReRAM mounted microcomputers

panasonic-300x159 Panasonic Corporation today announced that it will start the world's first mass-production of microcomputers with mounted ReRAM, a type of non-volatile memory, in August 2013. Now, Panasonic is taking ReRAM into the mass market with the news that it has become the first company to begin mass production of a product based around the technology. Dubbed the MN101LR series, the microcomputers are being produced from August at a rate of a million units per month using the company's newly-developed 0.18µm ReRAM modules.

Designed for embedded use, the systems are eight-bit microcomputers running at 10MHz with just a few kilobytes of ReRAM available to the user.

This development has the following features:

- The use of the newly developed 0.18 µm ReRAM in microcomputers and low power-consumption processes contributes to longer operational times for customers' products.
- The high-speed, low power-consumption by byte rewriting can easily reduce the amount of EEPROM [3] previously required as part of an external attachment, thereby reducing the system cost.
- The ReRAM to be produced this time around is based on the rewriting principle of a redox reaction of a metal oxide, in which high-speed rewriting and high reliability can be achieved, making it ideal for industrial applications.

Get free daily email updates!

Follow us!

Wednesday, 24 July 2013

Motorola touts the power of 8 cores in new phone SoC

motorola-x8-soc-gpu Motorola Mobility has designed an eight-core system-on-a-chip device, the Motorola X8 Mobile Computing System, to be used in its Droid mobile handsets. The SoC features a dual-core application processor, four cores for graphics, one for "contextual computing" and the eighth for "natural language processing," according to the Google subsidiary. Motorola declined to identify the silicon foundry making the chip.

Get free daily email updates!

Follow us!

Wednesday, 17 July 2013

Guidelines for Successful SoC Verification in OVM/UVM

uvm-logo-web1 With the increasing adoption of OVM/UVM, there is a growing demand for guidelines and best practices to ensure successful SoC verification. It is true that the verification problems did not change but the way the problems are approached and the structuring of the solutions, i.e. verification environments, depends much on the methodology. There are two key categories for SoC verification guidelines: process, enabled by tools, and methodology. The process guidelines are about what you need to do and in what order, while the methodology guidelines are about how to do it. This paper will first describe the basic tenets of OVM/UVM, and then it tries to summarize key guidelines to maximize the benefits of using state of the art verification methodology such as OVM/UVM.

The BASIC TENETS of OVM/UVM

1. Functionality encapsulation

OVM [1] promotes composition and reuse by encapsulating functionality in a basic block called ovm_component. This basic block contains a run task, i.e a functional block that can consume time that acts as an execution thread responsible for implementing functionality as simulation progress.

2. Transaction-Level Modeling (TLM)

OVM/UVM uses TLM standard to describe communication between verification components in an OVM/UVM environment. Because OVM/UVM standardizes the way components are connected, components are interchangeable as long as they provide and require the same interfaces. One of the main advantages of using TLM is in abstracting the pin and timing details. A transaction, the unit of information exchange between TLM components, encapsulates the abstract view of stimulus that can be expanded by a lower-level component. One the pitfalls that can undermine the value of TLM is adding excessive timing details by generating transaction and delivering them on each clock cycle.

3. Using sequences for stimulus generation

The transactions need to be generated by an entity in the verification environment. Relying on a component to generate the transactions is limiting because it will require changing the component each time a different sequence of transactions is required. Instead OVM/UVM allows for flexibility by introducing ovm_sequence. ovm_sequence is a wrapper object around a function called body(). It is very close to an OOP pattern called "functor" that wraps a function in an object to allow it to be passed as a parameter but SystemVerilog does not support operator overloading [1]. ovm_sequence when started, register itself with an ovm_sequencer which is an ovm_component that acts as the holder of different sequences and can connect to other ovm_components. The ovm_sequence and ovm_sequencer duo provides the flexibility of running different streams of transactions without having to change the component instantiation.

4. Configurability

Configurability, an enabler to productivity and reuse, is a key element in OVM/UVM. In OVM/UVM, user can change the behavior of an already instantiated component by three means: configuration API, Factory overrides and callbacks.

5. Layering

Layering is a powerful concept in which every level takes care of the details at specific layers. OVM layering can be applied to components, which can be called hierarchy and composition, and to configuration and to stimulus. Typically there is a correspondence between layering of components and objects. Layering stimulus, on the other hand, can reduce the complexity of stimulus generation.

6. Emphasis on reuse (vertical and horizontal)

All the tenets mentioned above lead to another important goal which is reuse. Extensibility, configurability and layering facilitate reuse. Horizontal reuse refers to reusing Verification IPs (VIPs) across projects and vertical reuse describes the ability to use block-level VIPs in cluster and chip level verification environments.

PROCESS GUIDELINES

1. Ordering of development tasks

The natural process for developing OVM/UVM verification environment is bottom-up. Blocks are first verified in block-level environments, and then the integration of the blocks into SoC is verified in chip-level testbench. Some refers to this methodology as IP-centric methodology because the blocks are considered IPs [4]. The focus of block-level verification is to verify the blocks thoroughly, while the chip-level is focused on verifying the integration of the blocks and the application scenarios. A bottom-up verification approach has several benefits:

  • Localization of bugs: finding bugs easily
  • Easier to test all the block modes at the block-level
  • Confidence in the block-level allowing them to be reused in several projects.

In this section we describe the recommended ordering for development of verification environment elements. Such ordering must be in mind when developing executable verification plans.

Table 1: Components Development Order

Interfaces
Agents
     Transaction
     Configuration
     Agent Skeleton
     Transactors
     Basic Sequences
Block level Subsystem
     Configuration
     Virtual Sequencer
      Initial Sequences/Tests
      Scoreboards & Protocol Checkers
      Coverage Model
      Constrained Random Sequences/Tests
Chip Level
       Integration of Subsystem environments
       Chip-Level Sequences/Tests

It is worth noting the following:

  • Once transaction fields are defined and implemented, the agent skeleton can be automatically generated.
  • Transactors refer to drivers and monitors
  • The reason for having the scoreboards & protocol checkers early on is to make sure that what was developed is functioning
  • Coverage model needs to be before the constrained random tests to guide the test development and eliminate redundancy. This is a corner stone of Coverage Driven verification. The coverage model not only guides the test writing effort but rather gives is a metric for verification progress and closure.
  • Each block/subsystem/cluster verification environment and tests act as a VIP for this block.

2. Use code and template generators

Whether you are relying on script or elaborate OVM template generators, these generators are keys to increase the productivity of verification engineers, reduce errors and increase code conformity. Code generators are also used to generate register models from specification thus automating the creation of these models

3. Qualify your VIPs

Qualify your VIP during development and before releasing them. First, several tools can conduct static checking on your VIP components for common errors and conformance to coding styles. They can also provide statistics about the size of your code, checks and covergroups.

Second, typically a simulator can provide statistics about memory consumption and performance bottlenecks of your VIP. Although SystemVerilog has automatic garbage collection, you can still have memory leaks because you keep a reference to dynamically allocated objects somewhere and forget about them.

Third, your VIPs should be robust to user mistakes whether in connections or proper use. You need to have sanity checks that can flag early a user error.

Finally, peer review is still beneficial to point-out issues that are missed in the other steps.

4. Incremental integration

As described in the introduction, OVM/UVM facilitates composition and layering. Several components/agents can form an environment and two or more environments can form a higher level environment. Incremental integration is important to reduce debugging time.

5. Better regression management and result analysis

The usual scripts that compile and run testcases come short when running complex OVM/UVM SoC verification environment. Typical requirements on run management is to keep track of seeds, log files of different tests, execution time, flexibility of running different groups of tests and running on local machine or grid. Once a regression is run we end up with data that needs to be processed to come out for useful information such as which tests passed/failed, common failure messages, which tests were more efficient and which seeds produced better coverage.

6. Communication and change management

Communication between verification engineers and specification owners should be captured in an issue tracking tool to avoid losing the information along the project. Also verification engineers need mechanism to share what they learn between each other, Wikis serve as good vehicles to knowledge sharing.

Change management is the other crucial element. By change management we are not only referring to code version management but the way the changes in RTL and block-level environments are handled in cluster or chip level environments.

METHODOLOGY GUIDELINES

1. CPU modeling

SoCs typically have one or more software programmable component such as microcontroller, microprocessor or DSP. Processor Driven Verification refers to using either a functional model of the processor or RTL model to verify the functionality of the SoCs. This approach is useful to verify the firmware interactions and certain application scenarios. However, for thorough verification of subsystems/cluster this approach can be costly in terms of effort, complexity, and simulation time. This paper proposes two level approach: for the verification of subsystems use a pin-accurate and protocol accurate Bus Functional Model (BFM), this will enable rapid development of the verification environment and tests and at the same time gives flexibility to the verification engineer in creating the environment and test. The BFM usually comes as VIP for the specific bus standard that the processor connects to. While the VIP usually models the standard interface faithfully, the processor might have extra side-band signals and interrupt. There are two approaches to this: the VIP can model in a generic way the side-band and interrupt controller behavior through the use of configuration, transactions and sequences. The other approach is to model the functionalities in different agents for side-band signals and interrupts. This increases the burden on the development and requires synchronization between different agents.

For the verification of firmware interaction, such as boot-loading or critical application scenarios, the RTL model or a full functional model can be used guarantee that firmware is validated versus the hardware it is going to run on and that the hardware.

2. Environment Reuse

Environments should be self-contained having only knowledge about its components and global elements and can communicate only through configuration mechanism, TLM connections or global events such as reset event. Following these rules, an environment at the block-level can be reused at the chip-level making the chip-level environment the integration of block-level environments.

3. Sequence Reuse

It is important to write sequences with eye on reusing them. In OVM/UVM, there are two types of sequences: sequence which sends transactions and sequences that starts sequences on sequencers. The latter is called a virtual sequence. Below is further classification of the sequences based on the functionality:

  • Basic agent sequence: this sequence allows the user to control the fields of a transaction that sent by the basic sequence from outside. The basic agent sequence acts as an interface or API to randomize or set the fields of the transactions sent by a higher layer which is usually the virtual sequence.
  • Register read/write sequences: these are sequences that try to write and read address mapped registers in the DUT. Two important rules need to be considered: they should have API that is independent of the bus protocol and rely on use the name of the register rather than address. A register package can be used to lookup the register address by name. For Example: OVM register package built-in sequences [5] supports this kind of abstraction. It is also expected that the UVM register package will support these rules. Abiding by these rules make these sequences reusable and maintainable because there is no need to update the sequence each time a register address changes.
  • DUT configuration sequences: some verification engineer try to provide sequences that abstracts the different configurations of the DUT into enum fields to ease the burden on the test writer. This way the test writer does not need to know about which register to write and with what value. These sequences are still reusable at the chip-level.
  • Virtual sequences on accessible interfaces at chip-level: These sequences are reusable from block-level to chip-level; some of them can be used to verify the integration into full-chip.
  • Virtual sequences on internal interfaces that are not visible at the chip-level: Special attention should be paid for sequences generating stimulus on interfaces that are no longer visible at the chip-level.

Although goals are different between block and chip level testing, some virtual sequences from block-level can be reused at chip-level as integration tests. Interfaces that become internal at the chip-level can be usually stimulated through some external interface. In order to make the last type of virtual sequences reusable at chip-level, it is better to plan ahead to abstract the data from the protocol. For example in Figure 1 of SoC diagram peripherals 1 through N are on peripheral bus which might be using a different protocol than the system bus. There are two approaches to make the sequences reusable:

Use functional abstraction by defining functions in the virtual sequence that can be overridden like:

write(register_name, value);

read(register_name, value);

Or rely on a layering technique like ovm_layering[3]. In this approach, a layering agent sits on top of a lower level agent and it forwards high-level transactions that can be translated by the low-level agent according to the bus standard. The high-level agent can be connected to a different low-level agent without any change to the high-level sequences.

Figure 1: Typical SoC Block Diagram20110509_1

4. Scoreboards

A critical component of self-checking testbenches is the scoreboard that is responsible for checking data integrity from input to output. A scoreboard is a TLM component, care should be taken not activate on a cycle by cycle basis but rather at the transaction level. In OVM/UVM, the scoreboard is usually connected to at least 2 analysis ports one from the monitors on the input(s) side and the other on the output(s) Figure 2 depicts these connections. A Scoreboard operation can be summarized in the following equations:

Expected = TF(Input Transaction);
Compare(Actual , Expected);

TF : Transfer function representing the DUT functionality from inputs to outputs

Sometimes the operation is described as predictor-comparator. Where the predictor computes the next output (transfer function) and the comparator checks the actual versus predicted (compare function). Usually the transfer function is not static but can change depending on the configuration of the devices. In SoC, most peripherals have memory-mapped registers that are used for configuration and status. These devices are usually called memory-mapped peripherals and they pose two challenges:

  • DUT transfer function and data-flow might change based on the configuration
  • Status bits should be verified

The common solution to the first one is to have a handle of the memory-map model and connect an analysis port from the configuration bus monitor to the scoreboard. On reception of new transaction on this analysis port, the scoreboard updates the peripheral's registerfile model and then uses it to update the transfer function accordingly. This approach has one disadvantage; each peripheral scoreboard has to implement the same functionality and needs to connect to the configuration bus monitor. A better approach is that the registerfile updates occur in a central component on the bus. To eliminate the need for the connections to the bus monitor, the register package can have an analysis port on each registerfile model. Each Scoreboard can connect to this registerfile model internally without the need for external connections. One of the requirements on the UVM register package is to have update notification method [6].

The second challenge is status bit verification. Status bits are usually modeled in the register model and register model can act as a predictor of the value of status bits. This requires that the scoreboard predicts changes to status bits, update the register models and on register reads the value read from the DUT is compared versus the register model.

There are other aspects to consider when implementing the scoreboards:

  • Data flow analysis: data flow can change based on configuration, or data flow can come from several inputs towards the output.
  • Scoreboard connection technique: Scoreboards can be connected to monitors using one of two ways: through ovm_imps in the scoreboard or through ovm_exports and tlm_analysis_fifos: the latter requires a thread on each tlm_analysis_fifo to get transactions while the former executes in the context of the caller.
  • Threaded or thread-less: the scoreboard can have 0 or more threads depending on a number of factors such as the connection method, the complexity of synchronization and experience of the developer. As a general rule, verification engineers should avoid spawning unnecessary threads in the scoreboard.

At the SoC level, there are two approaches to organize scoreboards with End-to-End and Multi-step [2]. Figure 3 depicts the difference between the two. The multi-step approach has several advantages over the end-to-end:

  • By product of the block-level to chip-level reuse.
  • The checking task is simpler since it is divided over several components each concerned with specific block.
  • Easy to localize bugs at block-level since the violating block scoreboard will flag the error

Figure 2: Scoreboard Connection in OVM20110509_2

 Figure 2: Scoreboard Connection in OVM20110509_3

CONCLUSION

OVM/UVM is a powerful verification methodology. To maximize the value achieved by adopting OVM/UVM there is a need for guidelines. These guidelines are not only for the methodology deployment but also for the verification process. This paper tried to summarize some of the pitfalls and tradeoffs and provide guidelines for successful SoC verification. The set of guidelines in this paper can help you plan ahead your SoC verification environment, avoid pitfalls and increase productivity.

Get free daily email updates!

Follow us!

Thursday, 27 June 2013

Quantum-tunneling technique

Quantum-tunneling technique promises chips that won't overheat

quantumpicResearchers at Michigan Technological University have employed room-temperature quantum tunneling to move electrons through boron nitride nanotubes. Semiconductor devices made with this technology would need less power than current transistors require, while also not generating waste heat or leaking electrical current, according to the research team.

Rather than relying on a predictable flow of electrons of current circuits, the new approach depends on quantum tunneling. In this, electrons travel faster than light and appear to arrive at a new location before having left the old one, and pass straight through barriers that should be able to hold them back. This appears to be under the direction of a cat which is possibly dead and alive at the same time, but we might have gotten that bit wrong.

here is a lot of good which could come out of building such a computer circuit. For a start, the circuits are built by creating pathways for electrons to travel across a bed of nanotubes, and are not limited by any size restriction relevant to current manufacturing methods.

Read more >>

SystemVerilog Fork Disable "Gotchas"

SystemVerilig-fork-join This is a long post with a lot of SystemVerilog code. The purpose of this entry is to hopefully save you from beating your head against the wall trying to figure out some of the subtleties of SystemVerilog processes (basically, threads). Subtleties like these are commonly referred to in the industry as "Gotchas" which makes them sound so playful and fun, but they really aren't either.

I encourage you to run these examples with your simulator (if you have access to one) so that a) you can see the results first hand and better internalize what's going on, and b) you can tell me in the comments if this code works fine for you and I'll know I should go complain to my simulator vendor.

OK, I'll start with a warm-up that everyone who writes any Verilog or SystemVerilog at all should be aware of, tasks are static by default. If you do this:

module top;
task do_stuff(int wait_time);
#wait_time $display("waited %0d, then did stuff", wait_time);
endtask

initial begin
fork
do_stuff(10);
do_stuff(5);
join
end
endmodule

both do_stuff calls will wait for 5 time units, and you see this:

waited 5, then did stuff
waited 5, then did stuff

I suppose being static by default is a performance/memory-use optimization, but it's guaranteed to trip up programmers who started with different languages. The fix is to make the task "automatic" instead of static:

module top;
task automatic do_stuff(int wait_time);
#wait_time $display("waited %0d, then did stuff", wait_time);
endtask

initial begin
fork
do_stuff(10);
do_stuff(5);
join
end
endmodule

And now you get what you expected:

module top;
task automatic do_stuff(int wait_time);
#wait_time $display("waited %0d, then did stuff", wait_time);
endtask

initial begin
fork
do_stuff(10);
do_stuff(5);
join_any
$display("fork has been joined");
end
endmodule

You'll get this output:

waited 5, then did stuff
fork has been joined
waited 10, then did stuff

That's fine, but that extra action from the slower do_stuff after the fork-join_any block has finished might not be what you wanted. You can name the fork block and disable it to take care of that, like so:

module top;
task automatic do_stuff(int wait_time);
#wait_time $display("waited %0d, then did stuff", wait_time);
endtask

initial begin
fork : do_stuff_fork
do_stuff(10);
do_stuff(5);
join_any
$display("fork has been joined");
disable do_stuff_fork;
end
endmodule

Unless your simulator, like mine, "in the current release" will not disable sub-processes created by a fork-join_any statement. Bummer. It's OK, though, because SystemVerilog provides a disable fork statement that disables all active threads of a calling process (if that description doesn't already make you nervous, just wait). Simply do this:

module top;
task automatic do_stuff(int wait_time);
#wait_time $display("waited %0d, then did stuff", wait_time);
endtask

initial begin
fork : do_stuff_fork
do_stuff(10);
do_stuff(5);
join_any
$display("fork has been joined");
disable fork;
end
endmodule

And you get:

waited 5, then did stuff
fork has been joined

Nothing wrong there. Now let's say you have a class that is monitoring a bus. Using a classes are cool because if you have two buses you can create two instances of your monitor class, one for each bus. We can expand our code example to approximate this scenario, like so:

class a_bus_monitor;
int id;

function new(int id_in);
id = id_in;
endfunction

task automatic do_stuff(int wait_time);
#wait_time $display("monitor %0d waited %0d, then did stuff", id, wait_time);
endtask

task monitor();
fork : do_stuff_fork
do_stuff(10 + id);
do_stuff(5 + id);
join_any
$display("monitor %0d fork has been joined", id);
disable do_stuff_fork;
endtask
endclass

module top;
a_bus_monitor abm1;
a_bus_monitor abm2;
initial begin
abm1 = new(1);
abm2 = new(2);
fork
abm2.monitor();
abm1.monitor();
join
$display("main fork has been joined");
end
endmodule

Note that I went back to disabling the fork by name instead of using the disable fork statement. This is to illustrate another gotcha. That disable call will disable both instances of the fork, monitor 1's instance and monitor 2's. You get this output:

monitor 1 waited 6, then did stuff
monitor 1 fork has been joined
monitor 2 fork has been joined
main fork has been joined

Because disabling by name is such a blunt instrument, poor monitor 2 never got a chance. Now, if you turn the disable into a disable fork, like so:

class a_bus_monitor;
int id;

function new(int id_in);
id = id_in;
endfunction

task automatic do_stuff(int wait_time);
#wait_time $display("monitor %0d waited %0d, then did stuff", id, wait_time);
endtask

task monitor();
fork : do_stuff_fork
do_stuff(10 + id);
do_stuff(5 + id);
join_any
$display("monitor %0d fork has been joined", id);
disable fork;
endtask

endclass

module top;
a_bus_monitor abm1;
a_bus_monitor abm2;
initial begin
abm1 = new(1);
abm2 = new(2);
fork
abm2.monitor();
abm1.monitor();
join
$display("main fork has been joined");
end
endmodule

You get what you expect:

monitor 1 waited 6, then did stuff
monitor 1 fork has been joined
monitor 2 waited 7, then did stuff
monitor 2 fork has been joined
main fork has been joined

It turns out that, like when you disable something by name, disable fork is a pretty blunt tool also. Remember my ominous parenthetical "just wait" above? Here it comes. Try adding another fork like this (look for the fork_something task call):

class a_bus_monitor;
int id;

function new(int id_in);
id = id_in;
endfunction

function void fork_something();
fork
# 300 $display("monitor %0d: you'll never see this", id);
join_none
endfunction

task automatic do_stuff(int wait_time);
#wait_time $display("monitor %0d waited %0d, then did stuff", id, wait_time);
endtask

task monitor();
fork_something();
fork : do_stuff_fork
do_stuff(10 + id);
do_stuff(5 + id);
join_any
$display("monitor %0d fork has been joined", id);
disable fork;
endtask

endclass

module top;
a_bus_monitor abm1;
a_bus_monitor abm2;

initial begin
abm1 = new(1);
abm2 = new(2);
fork
abm2.monitor();
abm1.monitor();
join
$display("main fork has been joined");
end
endmodule

The output you get is:

monitor 1 waited 6, then did stuff
monitor 1 fork has been joined
monitor 2 waited 7, then did stuff
monitor 2 fork has been joined
main fork has been joined

Yup, fork_something's fork got disabled too. How do you disable only the processes inside the fork you want? You have to wrap your fork-join_any inside of a fork-join, of course. That makes sure that there aren't any other peers or child processes for disable fork to hit. Here's the zoomed in view of that (UPDATE: added missing begin...end for outer fork):

task monitor();
fork_something();
fork begin
fork : do_stuff_fork
do_stuff(10 + id);
do_stuff(5 + id);
join_any
$display("monitor %0d fork has been joined", id);
disable fork;
end
join
endtask

And now you get what you expect:

monitor 2 fork has been joined
monitor 1 fork has been joined
monitor 1 waited 6, then did stuff
monitor 2 waited 7, then did stuff
main fork has been joined
monitor 1 waited 11, then did stuff
monitor 2 waited 12, then did stuff
monitor 2: you'll never see this
monitor 1: you'll never see this

So, wrap your fork-join_any inside a fork-join or else it's, "Gotcha!!!" (I can almost picture the SystemVerilog language designers saying that out loud, with maniacal expressions on their faces).

But wait, I discovered something even weirder. Instead of making that wrapper fork, you can just move the fork_something() call after the disable fork call and then it doesn't get disabled (you actually see the "you'll never see this" message, try it). So, you might think, just reordering your fork and disable fork calls and that will fix your problem. It will, unless (I learned by sad experience) the monitor task is being repeatedly called inside a forever loop. Here's a simplification of the code that really inspired me to write this all up:

class a_bus_monitor;
int id;

function new(int id_in);
id = id_in;
endfunction

function void fork_something();
fork
# 30 $display("monitor %0d: you'll never see this", id);
join_none
endfunction

task automatic do_stuff(int wait_time);
#wait_time $display("monitor %0d waited %0d, then did stuff", id, wait_time);
endtask // do_stuff

task monitor_subtask();
fork : do_stuff_fork
do_stuff(10 + id);
do_stuff(5 + id);
join_any
$display("monitor %0d fork has been joined", id);
disable fork;
fork_something();
endtask

task monitor();
forever begin
monitor_subtask();
end
endtask

endclass

module top;
a_bus_monitor abm1;
a_bus_monitor abm2;

initial begin
abm1 = new(1);
abm2 = new(2);
fork
abm2.monitor();
abm1.monitor();
join_none
$display("main fork has been joined");
# 60 $finish;
end
endmodule

The fork inside the fork_something task will get disabled before it can do its job, even though it's after the disable fork statement.

My advice? Just always wrap any disable fork calls inside a fork-join.